This paper proposes a novel method for bilingual lexicon extraction from comparable corpora using graph-based label propagation. A previous study found that performance drastically decreases when the coverage of a seed lexicon is small. We address this problem by using indirect relations with bilingual seeds together with direct relations, in which each word is represented by a distribution of lexical seeds. The seed distributions are propagated over a graph that represents relations among words. Translation pairs are extracted by identifying word pairs with high similarities in the seed distributions. We propose two types of graphs: (1) a co-occurrence graph, representing co-occurrence relations between words; and (2) a similarity graph, representing context similarities between words. Evaluations on comparable corpora of English and Japanese patent documents show that our proposed graph propagation method outperforms conventional methods. Further, the similarity graph improved performance by clustering synonyms into the same translation.
In linguistics, sound symbolism is an idea that the vocal sounds of certain words carry meaning in themselves. This paper focuses on the sound symbolism of onomatopoeic words and demonstrates the close relationship between sound symbolism and sentiment polarity. Because onomatopoeic words imitate the sounds they represent, they can help us better understand the sentiment of a sentence when utilizing sound symbolism. Therefore, we modeled sound symbolism with N-gram-based features and applied the model to a series of sentiment classification tasks. The experimental results show that this method with sound symbolism significantly outperformed the baseline method without sound symbolism, which effectively demonstrates that a close relationship exists between sound symbolism and sentiment polarity.