Host: The Japanese Society for Artificial Intelligence
Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 39
Location : [in Japanese]
Date : May 27, 2025 - May 30, 2025
We introduce a reinforcement learning approach that utilizes back-translation to numerical data for Data-to-Text generation with large language models (LLMs). Numerical data can have multiple possible interpretations, making it difficult to predefine their meaning and the key points to be explained before conducting an analysis. In this study, we focus on information recoverability in explaining numerical data and propose a reinforcement learning approach based on Proximal Policy Optimization (PPO). This approach does not require prior reference definitions and uses the error in back-translation to numerical data as a reward signal. Our experiments demonstrate that the proposed method significantly improves explanatory performance after training. Furthermore, the explanatory performance achieved with our method is significantly higher than that obtained using Direct Policy Optimization (DPO), a training method that does not require the design of a reward function. These results highlight the effectiveness of using back-translation error as a reward for enhancing explanatory performance.