Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation
Akiva MiuraGraham NeubigMichael PaulSatoshi Nakamura
Author information
JOURNAL FREE ACCESS

2017 Volume 24 Issue 3 Pages 463-489

Details
Abstract

Active learning is a framework that makes it possible to efficiently train statistical models by selecting informative examples from a pool of unlabeled data. Previous work has found this framework effective for machine translation (MT), making it possible to train better translation models with less effort, particularly when annotators translate short phrases instead of full sentences. However, previous methods for phrase-based active learning in MT fail to consider whether the selected units are coherent and easy for human translators to translate, and also have problems with selecting redundant phrases with similar content. In this paper, we tackle these problems by proposing two new methods for selecting more syntactically coherent and less redundant segments in active learning for MT. Experiments using both simulation and extensive manual translation by professional translators find the proposed method effective, achieving both greater gain of BLEU score for the same number of translated words, and allowing translators to be more confident in their translations.

Content from these authors
© 2017 The Association for Natural Language Processing
Previous article Next article
feedback
Top