IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition
Dongni HUChengxin CHENPengyuan ZHANGJunfeng LIYonghong YANQingwei ZHAO
著者情報
ジャーナル フリー

2021 年 E104.D 巻 8 号 p. 1391-1394

詳細
抄録

Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.

著者関連情報
© 2021 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top