主催: 一般社団法人 日本機械学会
会議名: 2024年度 年次大会
開催日: 2024/09/08 - 2024/09/11
Currently, many technologies have been proposed to identify speakers and transcribe speech from multi-person conversation data, such as meetings, and multiple services are commercially available from various companies. These services are continuously being improved, and the accuracy of speaker identification and transcription is also improving. Furthermore, some services attempt to estimate the emotions of individual speakers. However, such emotion estimation is limited to scenarios involving one-on-one audio data, such as call centers, and does not extend to estimating emotions from multi-person conversation data. Understanding whether a conversation was heated or not from multi-person conversation data remains limited to textual information, making it difficult to accurately infer the emotional context. Therefore, this study attempts to identify speakers from multi-person conversation data, separate or capture the audio data, and estimate the emotions of the identified speakers. As a result, it was possible to identify sections where any given speaker spoke alone and to estimate the emotions in those identified sections.