日本語LLMの多面的な評価リーダーボードの構築

山本 祐也; 鎌田 啓輔; 柴田 暁

doi:10.11517/pjsai.JSAI2024.0_2G1GS1104

Abstract

Nejumi LLM Leaderboard Neo, aims to provide a comprehensive evaluation of Japanese large language models (LLMs) from multiple perspectives. This leaderboard assesses models based on their language understanding and generation capabilities. This evaluation combines benchmark tests in a question-and-answer format with Japanese language generation tasks to evaluate models' comprehension and text generation abilities. Insights gained from the operation of the leaderboard highlight the importance of model comparison and the need for transparent and uniform evaluation criteria. Differences in conversational abilities and response to structured questions among various models were observed, revealing a correlation between language understanding and generative abilities in conversation. However, it has been noted that a trade-off emerges among models of comparable parameter sizes. Nejumi LLM Leaderboard Neo offers a novel approach to evaluating Japanese LLMs, contributing to the further evolution and improvement of Japanese language models.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!