Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 2G1-GS-11-04
Conference information

Development of a Comprehensive Evaluation Leaderboard for Japanese Language LLMs
*Yuya YAMAMOTOKeisuke KAMATAAkira SHIBATA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Nejumi LLM Leaderboard Neo, aims to provide a comprehensive evaluation of Japanese large language models (LLMs) from multiple perspectives. This leaderboard assesses models based on their language understanding and generation capabilities. This evaluation combines benchmark tests in a question-and-answer format with Japanese language generation tasks to evaluate models' comprehension and text generation abilities. Insights gained from the operation of the leaderboard highlight the importance of model comparison and the need for transparent and uniform evaluation criteria. Differences in conversational abilities and response to structured questions among various models were observed, revealing a correlation between language understanding and generative abilities in conversation. However, it has been noted that a trade-off emerges among models of comparable parameter sizes. Nejumi LLM Leaderboard Neo offers a novel approach to evaluating Japanese LLMs, contributing to the further evolution and improvement of Japanese language models.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top