Abstract
In this paper, Japanese morphological analyzer is proposed using composite part-of-speech (POS) and morpheme sequence N-gram (Composite N-gram). Composite N-gram is a N-gram type language model whoes unit is POS class, morpheme and morpheme-sequence, which can give an excellent prediction ability from small corpus. In order to apply unknown words, we improved the composite N-gram by considering the probability that unknown word appears from POS class. Experimental results showed that morpheme accuracy using composite N-gram reached a maximum of 99.17%, which was better than using conventional rule based method. Considering the pronounciation to the evaluation, the accuracy was 98.68%. When applied to sentences including unknown words, the fall of the morpheme accuracy was only about 0.8%.