再帰的学習に基づく大規模対訳コーパスのクリーンアップ

松永 務; 佐藤 大輔; 原 正巳

doi:10.3156/jsoft.29.1_527

Abstract

While statistical machine translation methods have been developed by using parallel corpus, a technical issue of collecting large amounts of good quality parallel sentence pairs has been raised.With recursive learning, which yields quantification of differences between sentences of one language and sentences of the other language by a statistical machine translation using the parallel corpus, a novel method of parallel corpus revision (clean-up) is proposed in this paper.By applying edit numbers to the sentence difference quantification, we show experimental results of the clean-up using Japanese-English patent parallel corpus.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!