Neural Machine Translation with Synchronous Latent Phrase Structure

Shintaro Harada; Taro Watanabe

doi:10.5715/jnlp.29.587

抄録

It has been reported that grammatical information is useful for machine translation (MT) tasks. However, the annotation of grammatical information incurs significant human costs. Furthermore, it is not trivial to adapt grammatical information to MT because grammatical annotation usually employs tokenization standards that might not capture the relation between two languages and consequently, subword tokenization such as byte-pair-encoding is used to alleviate out-of-vocabulary problems; however, this might not be compatible with those annotations. In this work, we introduce two methods to incorporate grammatical information without supervising annotation explicitly: first, the latent phrase structure is induced in an unsupervised fashion from an attention mechanism; and second, the induced latent phrase structures in the encoder and decoder are synchronized so that they are compatible with each other using constraints during training. We demonstrate that our approach performs better in two tasks: translation and word alignment, without extra resources. We found that the induced phrase structures enhance the precision of alignments through the synchronization constraint after exact phrase and alignment structure analysis.

著者関連情報

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）