エンコーダの潜在空間の幾何学量推定 CNN・Transformerの情報幾何学を用いた分析

赤塚 育海; 村田 昇

doi:10.11517/pjsai.JSAI2025.0_3S1GS204

Abstract

Encoders such as CNNs and Transformers can embed high-dimensional objects (e.g., images) into low-dimensional vectors via an object embedding operation, and many previous studies treat the latent space formed by these embedding vectors as a Euclidean space. In this study, I aim to capture the geometric structure of the latent space that may be overlooked under a purely Euclidean assumption. To this end, we propose a method that associates the encoder’s intermediate representations with probability distributions, thereby defining an information-geometric manifold on which we can estimate geometric quantities such as metrics and curvature. The set of distributions obtained by inputting an image dataset into the encoder forms an information-geometric manifold with the α-divergence as its distance, and its expectation coordinates coincide with the embedding vectors. Through experiments estimating the metric and curvature of the MNIST dataset learned by a CNN, we found that the latent space exhibits positive curvature in many regions, indicating that it is not necessarily flat.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!