2004 Volume 11 Issue 4 Pages 89-103
This paper proposes a method of extracting English compound words and their Japanese equivalents from a parallel corpus.The aim of our research is to extractcompound words which are not listed in a dictionary of an English-to-Japanese MTsystem and appear infrequently in a parallel corpus.Our method makes its alignmenton the basis of two kinds of external evidence provided by the context in which abilingual pair appears, as well as two kinds of internal evidence within the pair.Eachkind of evidence is accompanied by a score, and the aggregate score is computed asa weighted sum of the scores.The appropriate weights are estimated with the logisticregression analysis.An experiment using a parallel corpus of Yomiuri Shimbunand The Daily Yomiuri satisfactorily found that 86.36% of the extracted bilingualpairs with the highest scores and 95.08% with the top two scores were judged to becorrect.