Character-LLM の評価指標の検討

木本 晴久; 人見 雄太; 佐藤 大地; 跡部 優吾; 小山 正彦; 片田 智大; 橋本 圭; ジューストー 沙羅; 守屋 貴行

doi:10.11517/pjsai.JSAI2024.0_4Xin2109

38th (2024)

Session ID : 4Xin2-109

DOI https://doi.org/10.11517/pjsai.JSAI2024.0_4Xin2109

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 38

Location : [in Japanese]

Date : May 28, 2024 - May 31, 2024

Character Setting Extraction for Enhanced Character-LLM Evaluation

*Haruhisa KIMOTO, Yuta HITOMI, Daichi SATO, Yugo ATOBE, Masahiko KOYAMA, Tomohiro KATADA, Kei HASHIMOTO, Giusto SARA, Takayuki MORIYA

Author information

Keywords: Character-LLM, Evaluation Metric, Virtual Human

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

Since Li et al.'s study, research on Character-Large Language Models (LLMs) engaging in character role-playing, termed Character-LLM, has progressed. Li et al. explored the reproducibility of 32 characters using two approaches: Retrieval Augmented Generation (RAG) and fine-tuning. Wang et al. proposed a method to quantitatively evaluate the personality traits of characters role-played by LLMs using psychological metrics such as the Big Five and MBTI. Shao et al. evaluated role-playing proficiency along five axes using ChatGPT. However, Wang's method only assesses personality trait similarity, while Shao's lacks clarity and may lead to black-box evaluation results. Japanese role-playing demands precise reproduction of various linguistic elements. This study proposes a method to automatically evaluate chatbots role-playing characters by extracting character settings from past utterances and conducting automatic evaluations. The experiment extracted 54 character settings from imma's tweet data, achieving a macro-averaged precision points in automatic evaluations.

Corresponding author

Conference information

Register with J-STAGE for free!