Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
Representation learning of multi-modal data has a potential to understand a shared structure across modalities. The objective of this study is to develop a computational framework that can learn to extract latent representations from multi-modal data by using a deep generative model. A particular modality is considered to hold low-dimensional latent representations; however, these representations are not always fully shared with another modality. Therefore, we assume that each modality holds both shared and private latent representations. Under this assumption, we propose a deep generative model that can learn to extract these different latent representations from both non-time-series and time-series data in an end-to-end manner. To evaluate this framework, we conducted a simulation experiment in which an artificial multi-modal dataset consisting of images and strokes with shared and private information was utilized. Experimental results demonstrate that the proposed framework successfully learned to extract both the shared and private latent representations.