数値への逆翻訳による大規模言語モデルの数値データ説明の改善

江部 正周; 青山 敦

doi:10.11517/pjsai.JSAI2025.0_3Win502

Abstract

We introduce a reinforcement learning approach that utilizes back-translation to numerical data for Data-to-Text generation with large language models (LLMs). Numerical data can have multiple possible interpretations, making it difficult to predefine their meaning and the key points to be explained before conducting an analysis. In this study, we focus on information recoverability in explaining numerical data and propose a reinforcement learning approach based on Proximal Policy Optimization (PPO). This approach does not require prior reference definitions and uses the error in back-translation to numerical data as a reward signal. Our experiments demonstrate that the proposed method significantly improves explanatory performance after training. Furthermore, the explanatory performance achieved with our method is significantly higher than that obtained using Direct Policy Optimization (DPO), a training method that does not require the design of a reward function. These results highlight the effectiveness of using back-translation error as a reward for enhancing explanatory performance.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!