Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4N3-GS-6-03
Conference information

Multi-Source Text Classification for Multilingual Language Models with Machine Translation
*Reon KAJIKAWAKeiichiro YAMADATomoyuki KAJIWARATakashi NINOMIYA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

To reduce the cost of training models for each language for developers of natural language processing applications, pre-trained multilingual sentence encoders are promising. However, since training corpora for such multilingual sentence encoders contain only a small amount of text in languages other than English, they suffer from performance degradation for non-English languages. To improve the performance of pre-trained multilingual sentence encoders for non-English languages, we propose a method of machine translating a source sentence into English and then inputting it together with the source sentence in a multi-source manner. Experimental results on sentiment analysis and topic classification tasks in Japanese revealed the effectiveness of the proposed method.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top