Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 3Xin4-65
Conference information

Verifying the Effectiveness of Sentence Embedding Learning in Japanese based on Contrastive Learning with Non-linguistic Data
*Hirofumi SHIMIZUDaisuke KAWAHARA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Sentence embedding learned from text is widely used for semantic textual similarity, automatic evaluation of text generation, and so on. As one of the sentence embedding learning methods, SimCSE based on contrastive learning is proposed and achieves high accuracy in the semantic textual similarity task. VisualCSE and AudioCSE, which are derivatives of SimCSE, are methods that add training using image and audio data in addition to text-based training and have been shown to further improve accuracy in English. However, these methods using non-linguistic data have not been validated in Japanese. This study examines the effectiveness of VisualCSE in Japanese. As a result, VisualCSE in Japanese did not show the significant improvement in accuracy seen in the English experiment. Also, we analyze the impact of sentence embedding learning by using noise data instead of image data.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top