Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
Distributed parallel learning is needed due to the growth of deep learning models and datasets. Data parallelization is the easiest distributed learning method to implement, where each GPU has redundant models and batches are distributed. However, as the number of GPUs increases, the batch size increases proportionally and the generalization performance deteriorates due to the loss of the implicit regularization effect of SGD. In this study, we aim to alleviate this large-batch problem by regularizing by the gradient norm.