自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
A Comparative Study on Effective Context Selection for Distributional Similarity
Masato HagiwaraYasuhiro OgawaKatsuhiko Toyama
著者情報
ジャーナル フリー

2008 年 15 巻 5 号 p. 119-150

詳細
抄録

Distributional similarity is a widely adopted concept to capture the semantic relatedness of words based on their context in various NLP tasks. While accurate similarity calculation requires a huge number of context types and co-occurrences, the contribution to the similarity calcualtion depends on individual context types, and some of them even act as noise. To select well-performing context and alleviate the high computational cost, we propose and investigate the effectiveness of three context selection schemes: category-based, type-based, and co-occurrence based selection. Categorybased selection is a conventional, simplest selection method which limits the context types based on the syntactic category. Finer-grained, type-based selection assigns importance scores to each context type, which we make possible by proposing a novel formalization of distibutional similarity as a classification problem, and applying feature selection techniques. The finest-grained, co-occurrence based selection assigns importance scores to each co-occurrence of words and context types. We evaluate the effectiveness and the trade-off between co-occurrence data size and synonym acquisition performance. Our experiments show that, on the whole, the finest-grained, co-occurrence based selection achieves better performane, although some of the simple category-based selection show comparable performance/cost trade-off.

著者関連情報
© The Association for Natural Language Processing
前の記事 次の記事
feedback
Top