主催: 人工知能学会
会議名: 第105回言語・音声理解と対話処理研究会
回次: 105
開催地: 東京科学大学大岡山キャンパス 蔵前記念会館 くらまえホール
開催日: 2025/11/10 - 2025/11/11
p. 31-36
When the same linguistic content carries different acoustic nuances, particularly in terms of expressed emotions, the corresponding dialogue system response must align with the given nuance. However, existing SLMs such as Qwen2-Audio are not necessarily robust against such differences. In this work, we define a task that detects the consistency or inconsistency between the emotional label of an utterance and the system's response, and build a model to perform this prediction. We hypothesize that emotion labels are a control signal that modulates text interpretation, and we construct a prediction model based on Feature-wise Linear Modulation (FiLM).