Host: The Japanese Society for Artificial Intelligence
Name : The 105th SIG-SLUD
Number : 105
Location : [in Japanese]
Date : November 10, 2025 - November 11, 2025
Pages 07-11
While Large Language Models (LLMs) show significant potential in psychological counseling, rigorous, multi-dimensional evaluation of dialogue quality is paramount to ensure service reliability and professional accountability. This critical assessment is necessary to identify best practices, continuously improve LLM performance, and build user trust in automated generative mental health support. Addressing the challenge of conducting this complex evaluation effectively, we introduce a novel Explanation-Guided Score Prediction Framework leveraging KokoroChat, a large-scale Japanese counseling dialogue dataset. The proposed framework fundamentally enhances the evaluation of LLM-based counseling systems by integrating quantitative score prediction with interpretable, structured explanations. These LLM-generated rationales (comprising a "reason" and a "reflection") serve as auxiliary supervision signals during the training process, effectively aligning the model's predictions with the logic of human evaluative reasoning. This approach encourages the model to learn semantically rich representations of counseling dialogues.