Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model
Hidetaka KamigaitoTaro WatanabeHiroya TakamuraManabu OkumuraEiichiro Sumita
Author information
JOURNAL FREE ACCESS

2017 Volume 25 Pages 912-923

Details
Abstract

In hierarchical phrase-based machine translation, a rule table is automatically learned by heuristically extracting synchronous rules from a parallel corpus. As a result, spuriously many rules are extracted which may be composed of various incorrect rules. The larger rule table incurs more disk and memory resources, and sometimes results in lower translation quality. To resolve the problems, we propose a hierarchical back-off model for Hiero grammar, an instance of a synchronous context free grammar (SCFG), on the basis of the hierarchical Pitman-Yor process. The model can generate compact rules and phrase pairs without resorting to any heuristics, because longer rules and phrase pairs are automatically backing off to smaller phrases under SCFG. Inference is efficiently carried out using two-step synchronous parsing of Xiao et al. combined with slice sampling. In our experiments, the proposed model achieved a higher or at least comparable translation quality against a previous Bayesian model on various language pairs: German/French/Spanish/Japanese-English. When compared against heuristic models, our model achieved comparable translation quality on a full size German-English language pair in Europarl v7 corpus with a significantly smaller grammar size; less than 10% of that for heuristic models.

Content from these authors
© 2017 by the Information Processing Society of Japan
Previous article Next article
feedback
Top