Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 1B5-GS-2-02
Conference information

Explicit modeling of the implicit regularization effect of SGD
*Shota NAKAMURARio YOKOTA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Distributed parallel learning is needed due to the growth of deep learning models and datasets. Data parallelization is the easiest distributed learning method to implement, where each GPU has redundant models and batches are distributed. However, as the number of GPUs increases, the batch size increases proportionally and the generalization performance deteriorates due to the loss of the implicit regularization effect of SGD. In this study, we aim to alleviate this large-batch problem by regularizing by the gradient norm.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top