Host: The Japanese Society for Artificial Intelligence
Name : The 99th SIG-SLUD
Number : 99
Location : [in Japanese]
Date : December 13, 2023 - December 14, 2023
Pages 07-12
Respiration is closely related to speech, so respiratory information is useful for improving multimodal spoken dialogue systems from various perspectives. A machine-learning task is presented for multimodal spoken dialogue systems to improve the compatibility of the systems and promote smooth interaction with them. This task consists of two subtasks: waveform amplitude estimation and waveform gradient estimation. A dataset consisting of respiratory data for 30 participants was created for this task, and a strong baseline method based on 3DCNN-ConvLSTM was evaluated on the dataset. Finally, our task was shown to be effective in predicting user voice activity after 200 ms. These results suggest that our task is effective for improving multimodal spoken dialogue systems.