Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
An Improvement of a Morphological Analysis by a Morpheme Clustering
SHINSUKE MORIMAKOTO NAGAO
Author information
JOURNAL FREE ACCESS

1998 Volume 5 Issue 2 Pages 75-103

Details
Abstract

This paper proposes improving a stochastic Japanese morphological analyzer through a morpheme clustering and an amelioration of the unknown word model. As a morpheme clustering, we propose a method which ameliorates a morpheme-based n-gram model into a class-based n-gram model with cross entropy criterion. As an amelioration of the unknown word model, we propose a method to incorporate a given morpheme set, such as dictionary, into it. As the result of experiments on the EDR corpus, we observed improvements of the accuracy. The analyzer adopting both methods marked a higher accuracy than an anteriorly reported part-of-speech-based tri-gram model. This result tells us that our morphological analyzer is better than the previous one in terms of accuracy. In addition to these experiments, we compared our analyzer with a grammarian's intuition-based analyser. The experimental results have shown the error rate of the stochastic analyzer was meaningfully smaller than that of the heuristic analyzer. The stochastic approach to Japanese morphological analysis is of great advantage to the ad-hoc method in higher accuracy, as well as in facility of further organized improvements.

Content from these authors
© The Association for Natural Language Processing
Previous article
feedback
Top