Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Analysis of Japanese Compound Nouns by Direct Text Scanning
TORU HISAMITSUYOSHIHIKO NITTA
Author information
JOURNAL FREE ACCESS

1998 Volume 5 Issue 4 Pages 35-60

Details
Abstract
Compound nouns tend to be important words because a compound noun conveys a lot of information which can even summarize a document. Therefore the analysis of compound nouns can contribute to machine translation, information extraction, or information retrieval. Since compound nouns lack syntactic clues, existing methods have utilized manually written rules and thesauri in order to analyze word dependency structure in compound nouns. Consequently the methods lack robustness in treating open corpora such as newspaper articles which contain a number of unregistered words. This paper presents a thesaurus-free corpus-based approach which scans a corpus with a set of templates and extracts co-occurrence data of the nouns which construct the compound noun. Unregistered words such as abbreviations and short compound nouns are detected in the process of template-matching and the co-occurrence data of the newly found words are additionally extracted, which leads to the robustness and high accuracy of the analysis. The accuracy of the methodwas evaluated using 400 compound nouns of length 5, 6, 7, and 8. The numbers of the correct analysis were 90, 86, 84, and 84 in 100 compound nouns of length 5, 6, 7, and 8 respectively.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top