Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Extracting Translation Pairs from Comparable Corpora through Graph-based Label Propagation
Akihiro TamuraTaro WatanabeEiichiro SumitaHiroya TakamuraManabu Okumura
Author information
JOURNAL FREE ACCESS

2013 Volume 20 Issue 2 Pages 133-160

Details
Abstract
This paper proposes a novel method for bilingual lexicon extraction from comparable corpora using graph-based label propagation. A previous study found that performance drastically decreases when the coverage of a seed lexicon is small. We address this problem by using indirect relations with bilingual seeds together with direct relations, in which each word is represented by a distribution of lexical seeds. The seed distributions are propagated over a graph that represents relations among words. Translation pairs are extracted by identifying word pairs with high similarities in the seed distributions. We propose two types of graphs: (1) a co-occurrence graph, representing co-occurrence relations between words; and (2) a similarity graph, representing context similarities between words. Evaluations on comparable corpora of English and Japanese patent documents show that our proposed graph propagation method outperforms conventional methods. Further, the similarity graph improved performance by clustering synonyms into the same translation.
Content from these authors
© 2013 The Association for Natural Language Processing
Previous article Next article
feedback
Top