Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Parser Self-Training for Syntax-Based Machine Translation
Makoto MorishitaKoichi AkabeYuto HatakoshiGraham NeubigKoichiro YoshinoSatoshi Nakamura
Author information
JOURNAL FREE ACCESS

2016 Volume 23 Issue 4 Pages 353-376

Details
Abstract

In syntax-based machine translation, it is known that the accuracy of parsing greatly affects the translation accuracy. Self-training, which uses parser output as training data, is one method to improve the parser accuracy. However, because automatically generated parse trees often include errors, these parse trees do not always contribute to improving accuracy. In this paper, we propose a method for removing noisy incorrect parse trees from the training data to improve the effect of self-training by using automatic evaluation metrics of translations. Specifically, we perform syntax-based machine translation using n-best parse trees, then we re-scoring parse trees based on the automatic evaluation score of translations. By using the parse trees that have higher score among the candidates for self-training, we can improve parsing and machine translation accuracy by using parallel corpora that are not annotated syntax structure. In experiments, using higher score parse trees for self-training, we found that our self-trained parsers significantly improve a state-of-the-art syntax-based machine translation system in two language pairs, and self-trained parsers significantly improve the accuracy of the parsing itself.

Content from these authors
© 2016 The Association for Natural Language Processing
Previous article
feedback
Top