Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Media (processing) and Interaction
Using Semi-supervised Learning for Question Classification
Tri Thanh NguyenLe Minh NguyenAkira Shimazu
著者情報
ジャーナル フリー

2008 年 3 巻 1 号 p. 112-130

詳細
抄録

Question classification, an important phase in question answering systems, is the task of identifying the type of a given question among a set of predefined types. This study uses unlabeled questions in combination with labeled questions for semi-supervised learning, to improve the precision of question classification task. For semi-supervised algorithm, we selected Tri-training because it is a simple but efficient co-training style algorithm. However, Tri-training is not well suitable for question data, so we give two proposals to modify Tri-training, to make it more suitable. In order to enable its three classifiers to have different initial hypotheses, Tri-training bootstrap-samples the originally labeled set to get different sets for training the three classifiers. The precisions of three classifiers are decreased because of the bootstrap-sampling. With the purpose to avoid this drawback by allowing each classifier to be initially trained on the originally labeled set while still ensuring the diversity of three classifiers, our first proposal is to use multiple algorithms for classifiers in Tri-training; the second proposal is to use multiple algorithms for classifiers in combination with multiple views, and our experiments show promising results.

著者関連情報
© 2008 by The Association for Natural Language Processing
前の記事 次の記事
feedback
Top