Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
In software development, code documentation is crucial for understanding and maintaining software. The manual creation and maintenance of code documentation are costly, leading to increased interest in automatic generation using Large Language Models (LLMs). However, the previous method of match-based evaluation cannot incorporate semantic interpretation and incurs high costs for preparing reference texts. We propose an execution-based evaluation method using back-translation to address these issues. Our approach back-translates LLM-generated code documentation into code and evaluates it based on execution results. This evaluation process enables assessments that include semantic interpretation, synonyms, and diversity of expression. In this paper, we introduce an automated evaluation tool, lm-chaineval-harness, that implements our proposed method and discusses validation experiments. lm-chaineval-harness, developed by our team, provides a user-friendly evaluation environment. The experimental results qualitatively show that our proposed method allows for evaluations incorporating semantic interpretation and accounts for synonyms and diversity of expression.