宝くじ仮説の観点からの Grokking の理解

峰岸 剛基; 岩澤 有祐; 松尾 豊

doi:10.11517/pjsai.JSAI2024.0_1B4GS203

Abstract

Grokking is the intriguing phenomenon of delayed generalization: initially, a network achieves a memorization solution with perfect training accuracy and limited generalization solution; however, through further training, it eventually attains a generalization solution. This paper counters previous notions that weight norm reduction explains grokking, by demonstrating through experiments that the identification of optimal subnetworks plays a crucial role in achieving generalization. It leverages the lottery ticket hypothesis to argue that finding these `lottery tickets' is key to transitioning from memorization to generalization. Our research presents empirical evidence, showing that (1) with the proper subnetworks, the delayed generalization does not occur, (2) with the similar weight norm, the dense networks still require substantially longer training to achieve full generalization, (3) with only structure optimization (without updating the value of weights), we can convert the memorization solution to the generalization solution. These results emphasize the importance of subnetwork identification over traditional weight norm reduction theories in explaining grokking's delayed generalization phenomenon.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!