SENSEVAL2J辞書タスクでのCRLの取り組み 日本語単語の多義性解消における種々の機械学習手法と素性の比較

村田 真樹; 内山 将夫; 内元 清貴; 馬 青; 井佐原 均

doi:10.5715/jnlp.10.3_115

Abstract

This paper describes our work for the Japanese dictionary-based lexical-sample task of Senseval-2. In this work, we compared various types of machine learning methods and features. For the contest, we submitted four systems to the Japanese dictionarybased lexical-sample task of Senseval-2. They were i) a support vector machine method, ii) a simple Bayes method, iii) a method combining a support vector machine and simple Bayes method, and iv) a method combining two kinds of a support vector machine method and two kinds of a simple Bayes method. The combined methods produced the best precision (0.786) among all the systems submitted to the contest. After the contest, we tuned the parameter used in the simple Bayes method, and it obtained higher precision. The system which achieved the best precision now was the method combining the two simple Bayes methods and its precision was 0.793. In this paper, we discussed the results of experiments changing the features used and investigated the effectiveness and the characteristics of each feature. From these results, we obtained an interesting conclusion that we could obtained good precision when we only used string features, which are strings of 1-gram to 3-gram just before/after the analyzed morpheme. We also showed some related works that are useful for future work on word sense disambiguation.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!