Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
34th (2020)
Session ID : 3Rin4-26
Conference information

Systematic Analysis of Linguistic Phenomena for Better Understanding Translation on User-Generated Contents
*Ryo FUJIIMasato MITAKaori ABEKazuaki HANAWAMakoto MORISHITAJun SUZUKIKentaro INUI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Neural Machine Translation (NMT) has shown drastic improvement on its quality when translating clean input. However, it still struggles with some kind of input with plentiful of noises, like User-Generated Contents (UGC) on the Internet. In order to make NMT systems indeed useful in promoting cross-cultural communication, one of the most promising direction we have to follow is to correctly handle with these input. Though necessary, it is still an open question that what brings the great gap of performance between translation of clean input and UGC. In this paper, we conducted systematic analysis on current dataset focusing on UGC and made it clear which linguistic phenomena greatly affected the translation performance.

Content from these authors
© 2020 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top