Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
The emergence of world models in language models has been a subject of research. One such study showed that Othello-GPT, a language model, trained to predict legal moves in Othello, spontaneously acquired a world representation of the game. This study provides insight into the emergence of world models through the intervention of internal representations. In this paper, we utilizes Othello-GPT, probes and SHapley Additive exPlanations (SHAP), which computes the contribution to the prediction. Using these methods, we identified the contribution of each neuron in the inner layer to the current state of the Othello board. We then pruned the neurons in Othello-GPT based on their contribution values. As a result, the accuracy of predicting legal moves was higher when pruning from the neuron with the lowest contribution value than when pruning from the neuron with the highest contribution value. This result suggests that Othello-GPT utilizes internal representations to predict legal moves.