Inflating a Small Parallel Corpus into a Large Quasi-parallel Corpus Using Monolingual Data for Chinese-Japanese Machine Translation

Wei Yang; Hanfei Shen; Yves Lepage

doi:10.2197/ipsjjip.25.88

抄録

Increasing the size of parallel corpora for less-resourced language pairs is essential for machine translation (MT). To address the shortage of parallel corpora between Chinese and Japanese, we propose a method to construct a quasi-parallel corpus by inflating a small amount of Chinese-Japanese corpus, so as to improve statistical machine translation (SMT) quality. We generate new sentences using analogical associations based on large amounts of monolingual data and a small amount of parallel data. We filter over-generated sentences using two filtering methods: one based on BLEU and the second one based on N-sequences. We add the obtained aligned quasi-parallel corpus to a small parallel Chinese-Japanese corpus and perform SMT experiments. We obtain significant improvements over a baseline system.

著者関連情報

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

Tanase, Takao(ed.), Socio-Legal Studies of Tobacco Litigations
２. 西之島の空中写真撮影と地図作成
19世紀クロアチア教育界におけるヨシップ・クリジャンの活動と思想の意義
Experimental Investigation of Film Cooling with Tangential Slot Injection in a LOX/CH₄ Subscale Rocket Combustion Chamber
Unsteady Pressure Fluctuations in an Inducer

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）