Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Japanese Morphiogical Analysis using Composite Part-of-speech and Morpheme Sequence N-gram
HIROKAZU MASATAKIYOSHINORI SAGISAKA
Author information
JOURNAL FREE ACCESS

1999 Volume 6 Issue 2 Pages 41-57

Details
Abstract
In this paper, Japanese morphological analyzer is proposed using composite part-of-speech (POS) and morpheme sequence N-gram (Composite N-gram). Composite N-gram is a N-gram type language model whoes unit is POS class, morpheme and morpheme-sequence, which can give an excellent prediction ability from small corpus. In order to apply unknown words, we improved the composite N-gram by considering the probability that unknown word appears from POS class. Experimental results showed that morpheme accuracy using composite N-gram reached a maximum of 99.17%, which was better than using conventional rule based method. Considering the pronounciation to the evaluation, the accuracy was 98.68%. When applied to sentences including unknown words, the fall of the morpheme accuracy was only about 0.8%.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top