自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
一般論文
Japanese–English Conversation Parallel Corpus for Promoting Context-aware Machine Translation Research
Matīss RiktersRyokan RiTong LiToshiaki Nakazawa
著者情報
ジャーナル フリー

2021 年 28 巻 2 号 p. 380-403

詳細
抄録

Most machine translation (MT) research has focused on sentences as translation units (sentence-level MT), and has achieved acceptable translation quality for sentences where cross-sentential context is not required in mainly high-resourced languages. Recently, many researchers have worked on MT models that can consider a cross-sentential context. These models are often called context-aware MT or document-level MT models. Document-level MT is difficult to 1) train with a small amount of document-level data; and 2) evaluate, as the main methods and datasets focus on sentence-level evaluation. To address the first issue, we present a Japanese–English conversation corpus in which the cross-sentential context is available. As for the second issue, we manually identify the main areas where sentence-level MT fails to produce adequate translations in the lack of context. We then create an evaluation set in which these phenomena are annotated to alleviate the automatic evaluation of document-level systems. We train MT models using our corpus to demonstrate how the use of context leads to improvements.

著者関連情報
© 2021 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top