マルチモーダルデータから共通・個別潜在表現を抽出する深層生成モデル

楠本 海斗; 村田 真悟

doi:10.11517/pjsai.JSAI2022.0_2M1OS19a03

Abstract

Representation learning of multi-modal data has a potential to understand a shared structure across modalities. The objective of this study is to develop a computational framework that can learn to extract latent representations from multi-modal data by using a deep generative model. A particular modality is considered to hold low-dimensional latent representations; however, these representations are not always fully shared with another modality. Therefore, we assume that each modality holds both shared and private latent representations. Under this assumption, we propose a deep generative model that can learn to extract these different latent representations from both non-time-series and time-series data in an end-to-end manner. To evaluate this framework, we conducted a simulation experiment in which an artificial multi-modal dataset consisting of images and strokes with shared and private information was utilized. Experimental results demonstrate that the proposed framework successfully learned to extract both the shared and private latent representations.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!