言語モデルの世界モデル創発に関する検証 プローブを用いた寄与度に基づく枝刈りによる内部表現分析

西浦 直哉; 青木 洸士郎; 武田 大佑; 熊谷 亘; 松尾 豊

doi:10.11517/pjsai.JSAI2024.0_3O1OS16b04

Abstract

The emergence of world models in language models has been a subject of research. One such study showed that Othello-GPT, a language model, trained to predict legal moves in Othello, spontaneously acquired a world representation of the game. This study provides insight into the emergence of world models through the intervention of internal representations. In this paper, we utilizes Othello-GPT, probes and SHapley Additive exPlanations (SHAP), which computes the contribution to the prediction. Using these methods, we identified the contribution of each neuron in the inner layer to the current state of the Othello board. We then pruned the neurons in Othello-GPT based on their contribution values. As a result, the accuracy of predicting legal moves was higher when pruning from the neuron with the lowest contribution value than when pruning from the neuron with the highest contribution value. This result suggests that Othello-GPT utilizes internal representations to predict legal moves.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!