Proceedings of Annual Conference, Japan Society of Information and Knowledge
Online ISSN : 2432-9908
ISSN-L : 2432-9908
Proceedings of the 1st Workshop on the Japan Society of Information and Knowledge
Conference information

The Full-Text Database of Genji Monogatari Taisei
Y. Ueda*H. UedaT. KabashimaM. Murakami
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages 33-36

Details
Abstract
 The quantitative analysis of sentences, the study of patterns formed in the process of linguistic encoding of information, has been applied to many important documents in foreign countries. However, it was first applied to Japanese documents only in the middle decades of the 20th century. The main reason for this delay is the following characteristic of the Japanese language.
 Japanese words are not separated by spaces as in English. Thus it is difficult for the computer to recognize word boundaries.
 The purpose of this study is to build a useful full-text database of Genji Monogatari for use in quantitative analysis. Using the Genji Monogatari Taisei published by Chuokoron-sha as a textbook, we divided all the sentences of Genji Monogatari into words to which were attached codes for parts of speech.
 In this paper we report how to build such a database and what difficulties we encountered in this process.
Content from these authors
© 1993 Japan Society of Information and Knowledge
Previous article Next article
feedback
Top