Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Reliable Measures for Aligning Japanese-English News Articles and Sentences
MASAO UTIYAMAHITOSHI ISAHARA
Author information
JOURNAL FREE ACCESS

2003 Volume 10 Issue 4 Pages 201-220

Details
Abstract
We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large parallel corpus. We first used a method based on cross-language information retrieval to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the articles and sentences included many incorrect alignments. To remove these, we propose two measures that evaluate the validity of the alignments. Using these measures, we successfully extracted a valid correspondence of about 47 thousands article pairs, 150 thousands 1-to-1 sentence pairs, and 38 thousands 1-to-many sentence pairs. We were therefore able to build the largest Japanese-English parallel corpus available to the public.
Content from these authors
© The Association for Natural Language Processing
Previous article
feedback
Top