自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
論文
Extracting Translation Pairs from Comparable Corpora through Graph-based Label Propagation
Akihiro TamuraTaro WatanabeEiichiro SumitaHiroya TakamuraManabu Okumura
著者情報
ジャーナル フリー

2013 年 20 巻 2 号 p. 133-160

詳細
抄録

This paper proposes a novel method for bilingual lexicon extraction from comparable corpora using graph-based label propagation. A previous study found that performance drastically decreases when the coverage of a seed lexicon is small. We address this problem by using indirect relations with bilingual seeds together with direct relations, in which each word is represented by a distribution of lexical seeds. The seed distributions are propagated over a graph that represents relations among words. Translation pairs are extracted by identifying word pairs with high similarities in the seed distributions. We propose two types of graphs: (1) a co-occurrence graph, representing co-occurrence relations between words; and (2) a similarity graph, representing context similarities between words. Evaluations on comparable corpora of English and Japanese patent documents show that our proposed graph propagation method outperforms conventional methods. Further, the similarity graph improved performance by clustering synonyms into the same translation.

著者関連情報
© 2013 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top