Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 4R2-OS-22a-03
Conference information

A Simple but Effective Method to Incorporate Multimodal Information for Utterance Relationship Comprehension
*Yasuhito OHSUGIYuka OZEKIShuhei TATEISHIYoshihisa KANOUMakoto NAKATSUJI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Multimodal information such as audio and video can be effective to comprehend relationships between utterances in meetings. To incorporate long sequences of audio and video with short sequences of text, the appoach based on periodic averaging or samping of audio and video sequences has been proposed. This approach, however, tends to include less meaningful features of audio and video in window of sampling. We introduce a method that resamples audio and video embeddings based on attentions between embeddings and few latent features. Especailly, those fixed-length few latent features can capture information of varying-length audio and video sequences effectively. Experiments on the multimodal meeting corpus, AMI, showed that our multimodal method was comparable with text-only method in comprehension supportive relationships between utterances.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top