人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
ハブの抑制によるコンパラブルコーパスからの対訳抽出精度の改善
重藤 優太郎鈴木 郁美原 一夫新保 仁松本 裕治
著者情報
ジャーナル フリー

2016 年 31 巻 2 号 p. E-F43_1-12

詳細
抄録

Most of the existing approaches to bilingual lexicon extraction (BLE) first map words in source and target languages into a single vector space, and then measure the similarity of words across the two languages in this space. We point out that existing BLE methods suffer from the so-called hubness phenomenon; i.e., a small number of translation candidates (hub candidates) are chosen by the systems as likely translations of many source words, which consequently degrade the accuracy of extracted translations. We show that this phenomenon can be alleviated by centering the data or by using the mutual proximity measure, which are two known techniques that effectively reduce hubness in standard nearest-neighbor search settings. Our empirical evaluation shows that naive nearest-neighbor search combined with these methods outperforms a recently proposed BLE method based on label propagation.

著者関連情報
© 人工知能学会 2016
前の記事 次の記事
feedback
Top