Multimodal Features for Automatic Emotion Estimation during Face-to-Face Conversation(<Feature Articles>Corpus-Based Data-Driven Speech Research in Asia)

Yoshiko ARIMOTO; Kazuo OKANOYA

doi:10.24467/onseikenkyu.19.1_53

特集「アジアにおけるコーパス・データ駆動型音声研究」

対面対話における感情の自動推定に向けたマルチモーダル特徴量の検討(<特集>アジアにおけるコーパス・データ駆動型音声研究)

Yoshiko ARIMOTO, Kazuo OKANOYA

著者情報

ジャーナルフリー

2015 年 19 巻 1 号 p. 53-67

DOI https://doi.org/10.24467/onseikenkyu.19.1_53

詳細

抄録

To develop an automatic emotion estimation system based on speaker information collected during face-to-face conversation, an extensive exploration of the multimodal features of speakers is required. To satisfy this requirement, a multimodal Japanese dialog corpus with dynamic emotional states was created by recording the vocal and facial expressions and physiological reactions of various speakers. Estimation experiments based on a mixed-effect model and multiple regression analysis were conducted to elucidate the relevant features for speaker-independent and speaker-specific emotion estimation. The results revealed that vocal features were most relevant for speaker-independent emotion estimation, whereas facial features were most relevant for speaker-specific emotion estimation.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）