自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
論文
Noise-aware Character Alignment for Extracting Transliteration Fragments
Katsuhito SudohShinsuke MoriMasaaki Nagata
著者情報
ジャーナル フリー

2014 年 21 巻 6 号 p. 1107-1131

詳細
抄録
This paper proposes a novel noise-aware character alignment method for automatically extracting transliteration fragments in phrase pairs that are extracted from parallel corpora. The proposed method extends a many-to-many Bayesian character alignment method by distinguishing transliteration (signal) parts from non-transliteration (noise) parts. The model can be trained efficiently by a state-based blocked Gibbs sampling algorithm with signal and noise states. The proposed method bootstraps statistical machine transliteration using the extracted transliteration fragments to train transliteration models. In experiments using Japanese-English patent data, the proposed method was able to extract transliteration fragments with much less noise than an IBM-model-based baseline, and achieved better transliteration performance than sample-wise extraction in transliteration bootstrapping.
著者関連情報
© 2014 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top