IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

Multimodal Speech Emotion Recognition Based on Large Language Model
Congcong FANGYun JINGuanlin CHENYunfan ZHANGShidang LIYong MAYue XIE
著者情報
ジャーナル フリー 早期公開

論文ID: 2024EDL8034

この記事には本公開記事があります。
詳細
抄録

Currently, an increasing number of tasks in speech emotion recognition rely on the analysis of both speech and text features. However, there remains a paucity of research exploring the potential of leveraging large language models like GPT-3 to enhance emotion recognition. In this investigation, we harness the power of the GPT-3 model to extract semantic information from transcribed texts, generating text modal features with a dimensionality of 1536. Subsequently, we perform feature fusion, combining the 1536-dimensional text features with 1188-dimensional acoustic features to yield comprehensive multi-modal recognition outcomes. Our findings reveal that the proposed method achieves a weighted accuracy of 79.62% across the four emotion categories in IEMOCAP, underscoring the considerable enhancement in emotion recognition accuracy facilitated by integrating large language models.

著者関連情報
© 2024 The Institute of Electronics, Information and Communication Engineers
feedback
Top