Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
This paper presents a computational model that mimics human word learning through cross-situational learning. Humans acquire word meanings by forming categories based on observed information about attributes like color and shape. The proposed model learns to understand attributes in images and establishes the relationship between attribute categories and words. To achieve this, we combine CSL-PGM, which facilitates cross-situation learning, with β-VAE, which enables unsupervised disentanglement of attributes. In our experiments, we trained the model on a dataset comprising images with five attributes and word sequences. Our model achieved a remarkable attribute comprehension rate of 99.9% for each word. In addition, the model outperformed existing multimodal generative models, achieving an 87.0% correct response rate for inferring images from word sequences.