自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
技術資料
Study on Supervised Learning of Vietnamese Word Sense Disambiguation Classifiers
Minh Hai NguyenKiyoaki Shirai
著者情報
ジャーナル フリー

2012 年 19 巻 1 号 p. 25-50

詳細
抄録

It is said that Vietnamese is a language with highly ambiguous words. However, there has been no published Word Sense Disambiguation (WSD hereafter) research on this language. This current research is the first attempt to study Vietnamese WSD. Especially, we would like to explore the effective features for training WSD classifiers and verify the applicability of the ‘pseudoword’ technique to both investigating effectiveness of features and training WSD classifiers. Three tasks have been conducted, using two corpora which were built manually based on Vietnamese Treebank and automatically by applying pseudowords technique. Experiment results showed that Bag-Of-Word feature performs well for all three categories of words (verbs, nouns, and adjectives). However, its combination with POS, Collocation or Syntactic features can not significantly improve the performance of WSD classifiers. Moreover, the experiment results confirmed that pseudoword is a suitable technique to explore the effectiveness of features in disambiguation of Vietnamese verbs and adjectives. Furthermore, we empirically evaluated the applicability of the pseudoword technique as an unsupervised learning method for real Vietnamese WSD.

著者関連情報
© 2012 The Association for Natural Language Processing
前の記事
feedback
Top