自然言語処理

巻頭言

編集委員長就任にあたり

奥村学

2016 年23 巻4 号 p. 325
発行日: 2016/09/15
公開日: 2016/12/15

DOIhttps://doi.org/10.5715/jnlp.23.325

ジャーナルフリー

PDF形式でダウンロード (99K)

論文

Unsupervised Word Alignment Using Frequency Constraint in Posterior Regularized EM

Hidetaka Kamigaito, Taro Watanabe, Hiroya Takamura, Manabu Okumura, Ei ...

2016 年23 巻4 号 p. 327-351
発行日: 2016/09/15
公開日: 2016/12/15

DOIhttps://doi.org/10.5715/jnlp.23.327

ジャーナルフリー

抄録を表示する抄録を非表示にする

Generative word alignment models, such as IBM Models, are restricted to one-to-many alignment, and cannot explicitly represent many-to-many relationships in bilingual texts. The problem is partially solved either by introducing heuristics or by agreement constraints such that two directional word alignments agree with each other. However, this constraint cannot take into account the grammatical difference of language pairs. In particular, function words are not trivial to align for grammatically different language pairs, such as Japanese and English. In this paper, we focus on the posterior regularization framework (Ganchev, Graca, Gillenwater, and Taskar 2010) that can force two directional word alignment models to agree with each other during training, and propose new constraints that can take into account the difference between function words and content words. We discriminate a function word and a content word using word frequency in the same way as done by Setiawan, Kan, andLi (2007). Experimental results show that our proposed constraints achieved better alignment qualities on the French-English Hansard task and the Japanese-English Kyoto free translation task (KFTT) measured by AER and F-measure. In translation evaluations, we achieved statistically significant gains in BLEU scores in the Japanese-English NTCIR10 task and Spanish-English WMT06 task.

抄録全体を表示

PDF形式でダウンロード (685K)
統語ベース翻訳のための構文解析器の自己学習

森下睦, 赤部晃一, 波多腰優斗, Graham Neubig, 吉野幸一郎, 中村哲

2016 年23 巻4 号 p. 353-376
発行日: 2016/09/15
公開日: 2016/12/15

DOIhttps://doi.org/10.5715/jnlp.23.353

ジャーナルフリー

抄録を表示する抄録を非表示にする

構文情報を考慮する機械翻訳手法である統語ベース翻訳では，構文解析器の精度が翻訳精度に大きな影響を与えることが知られている．また，構文解析の精度向上を図る手法の一つとして，構文解析器の出力を学習データとして用いる構文解析器の自己学習が提案されている．しかし，構文解析器が生成する構文木には誤りが存在することから，自動生成された構文木が常に精度向上に寄与するわけではない．そこで本論文では，機械翻訳における自動評価尺度を用いて，このような誤った構文木を学習データから取り除き，自己学習の効果を向上させる手法を提案する．具体的には，解析された n-best 構文木それぞれを用いて統語ベース翻訳を行い，それぞれの翻訳結果に対し，自動評価尺度でリスコアリングする．この中で，良いスコアを持つ構文木を自己学習に使用することで，構文構造はアノテーションされていないが，対訳が存在するデータを用いて，構文解析・機械翻訳の精度を向上させることができる．実験により，本手法で自己学習したモデルを用いることで，統語ベース翻訳システムの翻訳精度が2つの言語対で有意に向上し，また構文解析自体の精度も有意に向上することが確認できた．

抄録全体を表示

PDF形式でダウンロード (654K)

J-STAGEへの登録はこちら（無料）