品詞および可変長形態素列の複合N-gramを用いた日本語形態素解析

政瀧 浩和; 匂坂 芳典

doi:10.5715/jnlp.6.2_41

Abstract

In this paper, Japanese morphological analyzer is proposed using composite part-of-speech (POS) and morpheme sequence N-gram (Composite N-gram). Composite N-gram is a N-gram type language model whoes unit is POS class, morpheme and morpheme-sequence, which can give an excellent prediction ability from small corpus. In order to apply unknown words, we improved the composite N-gram by considering the probability that unknown word appears from POS class. Experimental results showed that morpheme accuracy using composite N-gram reached a maximum of 99.17%, which was better than using conventional rule based method. Considering the pronounciation to the evaluation, the accuracy was 98.68%. When applied to sentences including unknown words, the fall of the morpheme accuracy was only about 0.8%.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!