Host: The Japanese Society for Artificial Intelligence
Name : The 32nd Annual Conference of the Japanese Society for Artificial Intelligence, 2018
Number : 32
Location : [in Japanese]
Date : June 05, 2018 - June 08, 2018
Multimodal data including images, sounds, texts is accumulated on the Internet. We can expect general-purpose data representation to perform tasks such as data discrimination, generation, and retrieval on various modalities datasets. The key idea for acquiring the representation is embedding a point from a data space of each modality in a point of common space. However, if data is embedded in a point, it becomes difficult to interpret the ambiguity of the data's meaning and the inclusive relation among the data. Of course, representation of data point does not necessarily need to be a point. In this study, we embed image and text into a normal distribution in a common space. This improves the performance of image retrieval.