Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Acquisition Method of Unknown Word's Morpheme Dictionary Information Using Word's Juxtapositional Relationships
CHUL-JAE PARKKATSUHIKO KAKEHI
Author information
JOURNAL FREE ACCESS

1997 Volume 4 Issue 1 Pages 71-86

Details
Abstract
This paper describes an inference method for acquiring morpheme information of unknown word from a large corpus. The method is comprised of three functions: inferring morpheme's part-of-speech, conjugation type, and conjugation (we call these morpheme attributes in this paper), updating inferred morpheme attributes by probability factors derived from a large corpus, and inferring Japanese language morphemes. The conjunctive relationships between words in a sentence are utilized to infer the morpheme attributes of unknown word. Since a Japanese sentence is a sequence of characters without any blank spaces to mark word boundaries, our system had to be able to identify word boundaries. To do this, it first follows character type sequence rules to search for the cardinal points of a partition.It then infers morphemes from the partition using the morphemes in its dictionary. The system has a complete dictionary which includes a few special parts of speech morphemes (particles and auxiliary-verb) in the initial stage. As the result of this morpheme attributes inference process, morphemes are then selected. Based upon these concepts, we developed a Japanese morpheme information acquisition system. Our experiments were conducted on a large corpus of 240, 000 morphemes. The text was composed of ASAHI newspaper editorials over a six-month period. We obtained an morpheme's accuracy inference rate of 90.5% for inflections and 95.2% for other parts of speech. The overall average morpheme's accuracy inference rate was 94.6%. There were 15, 523 unique headwords automatically obtained from a total of 228, 450 inferred morphemes.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top