Host: The Japanese Society for Artificial Intelligence
Name : The 35th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 35
Location : [in Japanese]
Date : June 08, 2021 - June 11, 2021
Language models such as GPT-2 and BERT improve the performance of language understanding tasks and language generation. These language models have begun to be shown to be applicable not only to language but also to non-linguistic data such as images and audio. By discretizing continuous data using VQ-VAE, it is possible to treat continuous data with language models in the same way as language data. We believe this discretization and learning discrete sequences by the language model can be applied to various types of data. The purpose of this study is to verify the modeling of human motion data using VQ-VAE and GPT-2. In our experiments, we trained VQ-VAE and GPT-2 on the CMU-mocap and 3DPW mocap dataset. We validated the learned models by forecasting the future motion from the motion input of current few frames.