抄録
The quantitative analysis of sentences, the study of patterns formed in the process of linguistic encoding of information, has been applied to many important documents in foreign countries. However, it was first applied to Japanese documents only in the middle decades of the 20th century. The main reason for this delay is the following characteristic of the Japanese language.
Japanese words are not separated by spaces as in English. Thus it is difficult for the computer to recognize word boundaries.
The purpose of this study is to build a useful full-text database of Genji Monogatari for use in quantitative analysis. Using the Genji Monogatari Taisei published by Chuokoron-sha as a textbook, we divided all the sentences of Genji Monogatari into words to which were attached codes for parts of speech.
In this paper we report how to build such a database and what difficulties we encountered in this process.