Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
36th (2022)
Session ID : 1P1-GS-10-04
Conference information

Quantifying a Multi-person Meeting based on Multi-modal Micro-behavior Analysis
*Chenhao CHENKosuke TOKUHARAYutaka ARAKAWAKo WATANABEShoya ISHIMARU
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In this paper, we present an end-to-end online meeting quantifying system, which can exactly detect and quantify three micro-behavior indicators, speaking, nodding, and smile, for online meeting evaluation. For active speaker detection (ASD), we build a multi-modal neural network framework which consists of audio and video temporal encoders, audio-visual cross-attention mechanism for inter-modality interaction, and a self-attention mechanism to capture long-term speaking evidence. For nodding detection, based on the WHENet framework proposed in the research field of head pose estimation (HPE), we can estimate the head pitch angles as the nodding feature. Then we build a gated recurrent unit (GRU) network with squeeze-and-excitation (SE) module to recognize nodding movement from videos. Finally, we utilize a Haar cascade classifier for smile detection. The experimental results using K-fold Cross Validation show that the F1-score of each detection module achieves 94.9%, 79.67% and 71.19% respectively.

Content from these authors
© 2022 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top