Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Extracting Bilingual Word Pairs with Maximum Entropy Modeling
KENGO SATOHIROAKI SAITO
Author information
JOURNAL FREE ACCESS

2002 Volume 9 Issue 1 Pages 101-115

Details
Abstract
Translation dictionaries used in multilingual natural language processing such as machine translation have been made manually, but a great deal of labor is required for this work and it is difficult to keep the description of the dictionaries consistent. Therefore, researches of extracting bilingual word pairs from parallel corpora automatically become active recently. In this paper, we propose a learning and extracting method of bilingual word pairs from aligned parallel corpora with the maximum entropy modeling. We define a probabilistic model of bilingual word pairs and four types of feature functions which express statistical and linguistic properties such as co-occurrence information and morphlogical information. Co-occurrence information restricts the sense of words. Morphlogical information restricts the part-of-speech of words. Experiment results in which Japanese and English parallel corpora are used show that our method performs better than the previous methods and can extract the bilingual word pairs which do not appear in the training corpus with almost the same accuracy as the appeared pairs due to the property of the maximum entropy modeling.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top