6D Multi-View NewtonianVAEによる世界モデルベースの6D姿勢推定と制御

寺島 舞; 前山 功伊; ウリグエン エルフリ ペドロ ミゲル; ジア ユアンユアン; 谷口 忠大

doi:10.11517/pjsai.JSAI2025.0_1B3OS41a03

39th (2025)

Session ID : 1B3-OS-41a-03

DOI https://doi.org/10.11517/pjsai.JSAI2025.0_1B3OS41a03

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 39

Location : [in Japanese]

Date : May 27, 2025 - May 30, 2025

6D Multi-View NewtonianVAE: A World Model-Based Approach for 6D Pose Estimation and Control

*Mai TERASHIMA, Katsuyoshi MAEYAMA, Pedro Miguel Uriguen ELJURI, Yuanyuan JIA, Tadahiro TANIGUCHI

Author information

Keywords: 6D control, World model, Visual feedback control, Multi-view image information

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

In this study, we propose a method for learning a latent space representing 6D poses and performing 6D control using NewtonianVAE. NewtonianVAE, as a type of world model, learns the dynamics of the environment as a latent space from observational data and performs proportional control based on the estimated position. By using NewtonianVAE, position estimation can be achieved based on the internal dynamics of the environment rather than an external coordinate system. While previous studies have applied Newtonian VAE to translational control, 6D control has not been investigated. To address this, we propose 6D Multi-View NewtonianVAE (6D-MNVAE), which extends the latent space by incorporating rotation vector. In our experiments, we evaluated whether 6D-MNVAE can estimate 6D poses in the latent space and perform 6D control towards a target pose. Experimental results showed that 6D-MNVAE achieved 6D control with an accuracy within 7 mm and 0.02 rad. Furthermore, our method does not require feature engineering or annotation and enables 6D control using only RGB image information.

Corresponding author

Conference information

Register with J-STAGE for free!