最大エントロピー法を用いた対訳単語対の抽出

佐藤 健吾; 斎藤 博昭

doi:10.5715/jnlp.9.101

Abstract

Translation dictionaries used in multilingual natural language processing such as machine translation have been made manually, but a great deal of labor is required for this work and it is difficult to keep the description of the dictionaries consistent. Therefore, researches of extracting bilingual word pairs from parallel corpora automatically become active recently. In this paper, we propose a learning and extracting method of bilingual word pairs from aligned parallel corpora with the maximum entropy modeling. We define a probabilistic model of bilingual word pairs and four types of feature functions which express statistical and linguistic properties such as co-occurrence information and morphlogical information. Co-occurrence information restricts the sense of words. Morphlogical information restricts the part-of-speech of words. Experiment results in which Japanese and English parallel corpora are used show that our method performs better than the previous methods and can extract the bilingual word pairs which do not appear in the training corpus with almost the same accuracy as the appeared pairs due to the property of the maximum entropy modeling.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!