Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Integrated Use of Internal and External Evidence in the Alignment of Compound Words
TAKEHIKO YOSHIMITAKESHI KUTSUMIKATSUNORI KOTANIICHIKO SATAHITOSHI ISAHARA
Author information
JOURNAL FREE ACCESS

2004 Volume 11 Issue 4 Pages 89-103

Details
Abstract

This paper proposes a method of extracting English compound words and their Japanese equivalents from a parallel corpus.The aim of our research is to extractcompound words which are not listed in a dictionary of an English-to-Japanese MTsystem and appear infrequently in a parallel corpus.Our method makes its alignmenton the basis of two kinds of external evidence provided by the context in which abilingual pair appears, as well as two kinds of internal evidence within the pair.Eachkind of evidence is accompanied by a score, and the aggregate score is computed asa weighted sum of the scores.The appropriate weights are estimated with the logisticregression analysis.An experiment using a parallel corpus of Yomiuri Shimbunand The Daily Yomiuri satisfactorily found that 86.36% of the extracted bilingualpairs with the highest scores and 95.08% with the top two scores were judged to becorrect.

Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top