2022 Volume 29 Issue 2 Pages 493-514
This study proposes a support tool for building zero-pronoun evaluation sets called the zero-pronoun annotation support tool (0Past; pronounced zero-past). The proposed tool provides a chat-like user interface to facilitate the navigation of human annotators. Each conversation is displayed separately, and while the user views a certain conversation, the messages within the conversation are displayed individually with a distinct color for the newest message. Using 0Past, two zero-pronoun evaluation sets are constructed. These evaluation sets are then used to evaluate neural machine translation (NMT) models’ performance translating Japanese conversations to English with the correct pronoun. Additionally, this study builds a zero-pronoun classification model by incorporating newly constructed evaluation sets and enables the tool to provide automated pre-annotation features, which can then be improved manually by human annotators. Finally, this study reports the evaluation results of training a Japanese-English neural machine translation model and compares its performance with two publicly available pretrained models in translating parallel conversational sentences from Japanese to English, which contains many omitted pronouns. The results confirm that phenomenon-specific evaluation sets are essential for better measuring NMT models when handling conversational sentences in Japanese, which is heavy on the anaphoric zero-pronoun phenomenon.