ドメイン固有の文字列情報の組み込みによる形態素解析処理の精度の向上

延澤 志保; 佐藤 健吾; 斎藤 博昭

doi:10.5715/jnlp.9.3_21

Abstract

We propose two methods for the recognition of unknown strings in dictionary-based natural language processing systems. One method is for the dynamic use of statistical information during processing, and the other is for obtaining meaningful strings which should be added to the dictionary. Both methods are based on statistical information drawn from a training corpus, and there is no need for part-of-speech tagging or other preprocessing of the training corpus. We applied our methods to a Japanese morphological analysis system and had good results in reduction of unknown words and over segmentation.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!