Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
The Use of Domain-Specific Statistical Data for Japanese Morphological Analysis
SHIHO NOBESAWAKENGO SATOHIROAKI SAITO
Author information
JOURNAL FREE ACCESS

2002 Volume 9 Issue 3 Pages 21-40

Details
Abstract
We propose two methods for the recognition of unknown strings in dictionary-based natural language processing systems. One method is for the dynamic use of statistical information during processing, and the other is for obtaining meaningful strings which should be added to the dictionary. Both methods are based on statistical information drawn from a training corpus, and there is no need for part-of-speech tagging or other preprocessing of the training corpus. We applied our methods to a Japanese morphological analysis system and had good results in reduction of unknown words and over segmentation.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top