複合語の内部情報・外部情報を統合的に利用した訳語対の抽出

吉見 毅彦; 九津見 毅; 小谷 克則; 佐田 いち子; 井佐原 均

doi:10.5715/jnlp.11.4_89

Abstract

This paper proposes a method of extracting English compound words and their Japanese equivalents from a parallel corpus.The aim of our research is to extractcompound words which are not listed in a dictionary of an English-to-Japanese MTsystem and appear infrequently in a parallel corpus.Our method makes its alignmenton the basis of two kinds of external evidence provided by the context in which abilingual pair appears, as well as two kinds of internal evidence within the pair.Eachkind of evidence is accompanied by a score, and the aggregate score is computed asa weighted sum of the scores.The appropriate weights are estimated with the logisticregression analysis.An experiment using a parallel corpus of Yomiuri Shimbunand The Daily Yomiuri satisfactorily found that 86.36% of the extracted bilingualpairs with the highest scores and 95.08% with the top two scores were judged to becorrect.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!