Host: The Japanese Society for Artificial Intelligence
Name : The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, 2019
Number : 33
Location : [in Japanese]
Date : June 04, 2019 - June 07, 2019
This study proposes models for detecting conversation boundaries in group discussions. First, we created a multimodal embedding space using an autoencoder, and applied a similarity-based approach to detect the discussion boundary. As the second method, we annotated conversation boundaries and created unimodal CNN models for language, audio, and head motion information. Then, created multimodal models by concatenating the output of unimodal models. In the evaluation experiment, we found that language information was the most useful modality, but by combining with audio and head motion modalities, the CNN-based models more accurately predict the conversation boundaries.