Hierarchical Sub-sentential Alignment with IBM Models for Statistical Phrase-based Machine Translation

Hao Wang; Yves Lepage

doi:10.5715/jnlp.24.619

Abstract

In this paper, we describe a novel method for joint word alignment and symmetrization. Based on initial parameters from simple IBM models, we synchronously parse the parallel sentence pair under the framework of bracket transduction grammar constraints. Our 2-phase method can achieve nearly the same run-time as fast_align while delivering better alignments on distantly-related language pairs such as English–Japanese. We show how to integrate this method into a standard phrase-based SMT pipeline. Although the alignment quality results are mixed, by forcing all words to be aligned (1-to-many/many-to-1), our method significantly reduces the phrase table size with no difference in translation quality and even outperforms fast_align in some end-to-end translation experiments.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!