Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 4Xin1-38
Conference information

Improvement of masked language model by Vokenization considering diversity of assigned images
*Shota HIRAIMasayasu MURAOKANaoaki OKAZAKI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Visual information plays an important role in the language acquisition by humans. While most of the large language models (LLM) that have been successful in various NLP tasks are trained only on textual data, the work of Vokenization established the new way of incorporating visual information into LLM training to improve the LLM performance in NLP tasks. However, the Vokenization process adversely assigns the same image to different tokens within a sentence, which prevents the LLM from learning the effective word representation. In this study, to further improve the performance of the LLM, we propose a method to diversify images assigned to tokens in the LLM training by exploiting top-k or top-p samplings. The experimental results showed that the effectiveness of our method on GLUE, an English comprehension benchmark, outperforming the baseline method that used top-1 retrieval in Vokenization.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top