RLHF を用いたゲームデータに関する LLM の学習手法の検討

村田 知弥; 森 直樹; 岡田 真

doi:10.11517/pjsai.JSAI2024.0_4A1GS602

Abstract

Recent advancements in Large Language Models (LLMs) within the artificial intelligence domain have shown exceptional performance across various natural language processing tasks. Amidst these developments, aligning the values and objectives of LLMs with human perspectives has become increasingly important. Reinforcement Learning from Human Feedback (RLHF) has gained notable interest as a method for such alignment adjustments. This study explored a learning approach for LLMs using RLHF, employing scenarios from the romance simulation game 'Tokimeki Memorial 3' as the game scenario data. Specifically, the research involved an experiment where sentences were generated following five Japanese characters, tailored to align with the personalities of the game characters. While subjective, this evaluation demonstrated the capability of producing sentences that appropriately matched the distinct characters in the game.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!