Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
39th (2025)
Session ID : 1B3-OS-41a-03
Conference information

6D Multi-View NewtonianVAE: A World Model-Based Approach for 6D Pose Estimation and Control
*Mai TERASHIMAKatsuyoshi MAEYAMAPedro Miguel Uriguen ELJURIYuanyuan JIATadahiro TANIGUCHI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In this study, we propose a method for learning a latent space representing 6D poses and performing 6D control using NewtonianVAE. NewtonianVAE, as a type of world model, learns the dynamics of the environment as a latent space from observational data and performs proportional control based on the estimated position. By using NewtonianVAE, position estimation can be achieved based on the internal dynamics of the environment rather than an external coordinate system. While previous studies have applied Newtonian VAE to translational control, 6D control has not been investigated. To address this, we propose 6D Multi-View NewtonianVAE (6D-MNVAE), which extends the latent space by incorporating rotation vector. In our experiments, we evaluated whether 6D-MNVAE can estimate 6D poses in the latent space and perform 6D control towards a target pose. Experimental results showed that 6D-MNVAE achieved 6D control with an accuracy within 7 mm and 0.02 rad. Furthermore, our method does not require feature engineering or annotation and enables 6D control using only RGB image information.

Content from these authors
© 2025 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top