情報知識学会研究報告会講演論文集
Online ISSN : 2432-9908
ISSN-L : 2432-9908
情報知識学会 第1回(1993年度)研究報告会講演論文集
会議情報

『源氏物語大成』のフルテキストデータベース
上田 裕一*上田 英代樺島 忠夫村上 征勝
著者情報
会議録・要旨集 フリー

p. 33-36

詳細
抄録
 The quantitative analysis of sentences, the study of patterns formed in the process of linguistic encoding of information, has been applied to many important documents in foreign countries. However, it was first applied to Japanese documents only in the middle decades of the 20th century. The main reason for this delay is the following characteristic of the Japanese language.
 Japanese words are not separated by spaces as in English. Thus it is difficult for the computer to recognize word boundaries.
 The purpose of this study is to build a useful full-text database of Genji Monogatari for use in quantitative analysis. Using the Genji Monogatari Taisei published by Chuokoron-sha as a textbook, we divided all the sentences of Genji Monogatari into words to which were attached codes for parts of speech.
 In this paper we report how to build such a database and what difficulties we encountered in this process.
著者関連情報
© 1993 情報知識学会
前の記事 次の記事
feedback
Top