主催: 人工知能学会
会議名: 第99回言語・音声理解と対話処理研究会
回次: 99
開催地: 国立国語研究所 講堂 / オンライン
開催日: 2023/12/13 - 2023/12/14
p. 07-12
Respiration is closely related to speech, so respiratory information is useful for improving multimodal spoken dialogue systems from various perspectives. A machine-learning task is presented for multimodal spoken dialogue systems to improve the compatibility of the systems and promote smooth interaction with them. This task consists of two subtasks: waveform amplitude estimation and waveform gradient estimation. A dataset consisting of respiratory data for 30 participants was created for this task, and a strong baseline method based on 3DCNN-ConvLSTM was evaluated on the dataset. Finally, our task was shown to be effective in predicting user voice activity after 200 ms. These results suggest that our task is effective for improving multimodal spoken dialogue systems.