FiLMを活用した音声対話における感情的対話破綻検出

中畔 彪雅; 吉野 幸一郎

doi:10.11517/jsaislud.105.0_31

抄録

When the same linguistic content carries different acoustic nuances, particularly in terms of expressed emotions, the corresponding dialogue system response must align with the given nuance. However, existing SLMs such as Qwen2-Audio are not necessarily robust against such differences. In this work, we define a task that detects the consistency or inconsistency between the emotional label of an utterance and the system's response, and build a model to perform this prediction. We hypothesize that emotion labels are a control signal that modulates text interpretation, and we construct a prediction model based on Feature-wise Linear Modulation (FiLM).

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

PDF閲覧時に認証を求められる記事がございます（発行後1年間）が，研究会登録メンバーは無料で閲覧可能です．認証のための購読者番号やパスワードは会員マイページにログインし「お知らせ」にてご確認下さい．メンバー以外の方は，storesにて購入いただけます．

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）