Host: The Japanese Society for Artificial Intelligence
Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 39
Location : [in Japanese]
Date : May 27, 2025 - May 30, 2025
In this study, we propose a method for learning a latent space representing 6D poses and performing 6D control using NewtonianVAE. NewtonianVAE, as a type of world model, learns the dynamics of the environment as a latent space from observational data and performs proportional control based on the estimated position. By using NewtonianVAE, position estimation can be achieved based on the internal dynamics of the environment rather than an external coordinate system. While previous studies have applied Newtonian VAE to translational control, 6D control has not been investigated. To address this, we propose 6D Multi-View NewtonianVAE (6D-MNVAE), which extends the latent space by incorporating rotation vector. In our experiments, we evaluated whether 6D-MNVAE can estimate 6D poses in the latent space and perform 6D control towards a target pose. Experimental results showed that 6D-MNVAE achieved 6D control with an accuracy within 7 mm and 0.02 rad. Furthermore, our method does not require feature engineering or annotation and enables 6D control using only RGB image information.