Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 1G4-OS-21a-02
Conference information

Construction and Validation of Action-Conditioned VideoGPT
*Koudai TABATAJunnosuke KAMOHARARyosuke UNNOMakoto SATOTaiju WATANABETaiga KUMEMasahiro NEGISHIRyo OKADAYusuke IWASAWAYutaka MATSUO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

World models acquire external structure based on observations of the external world and can predict the future states of the external world as it changes with the action of the agent. Recent advances in generative models and language models have contributed to multi-modal world models, which are expected to be applied in various domains, including automated driving and robotics. Video prediction is the field that has made progress in terms of high fidelity and long term prediction, and world models have potential applications for acquiring temporal representations. One example of model architecture that has performed well is a combination of Encoder-Decoder based latent variable model for image reconstruction and auto-regressive model for prediction of latent sequence. In this work, we extend a video prediction model called VideoGPT, which uses VQVAE and Image-GPT by introducing action conditioning. Validation with CARLA and RoboNet showed improved performance compared to the model without conditioning.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top