ADOPT: ハイパーパラメータに依存せずに最適レートで収束する適応的最適化アルゴリズムの提案

谷口 尚平; 原田 憲旺; 峰岸 剛基; 大島 佑太; 鄭 晟徹; 長原 豪; 飯山 燈; 鈴木 雅大; 岩澤 有祐; 松尾 豊

doi:10.11517/pjsai.JSAI2024.0_4D3GS201

38th (2024)

Session ID : 4D3-GS-2-01

DOI https://doi.org/10.11517/pjsai.JSAI2024.0_4D3GS201

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 38

Location : [in Japanese]

Date : May 28, 2024 - May 31, 2024

ADOPT: an Adaptive Gradient Method with the Optimal Convergence Rate with Any Hyperparameters

*Shohei TANIGUCHI, Keno HARADA, Gouki MINEGISHI, Yuta OSHIMA, Seong Cheol JEONG, Go NAGAHARA, Tomoshi IIYAMA, Masahiro SUZUKI, Yusuke IWASAWA, Yutaka MATSUO

Author information

Keywords: stochastic optimization, Adam

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

Adaptive gradient methods, such as Adam, are widely used for deep learning. However, it is known that they do not converge unless choosing hyperparameters in a problem-dependent manner. There have been many attempts to fix their convergence (e.g., AMSGrad), but they require an impractical assumption that the gradient noise is uniformly bounded. In this paper, we propose a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate of O(1/√T) with any hyperparameter choice without the bounded noise assumption. ADOPT addresses the non-convergence issue of Adam by removing the current gradient from the second moment estimate and changing the order of the momentum calculation and the scaling operation by the second moment estimate. We also conduct intensive numerical experiments, and verify that our ADOPT achieves competitive or even better results compared to Adam and its variants across a wide range of tasks, including image classification, generative modeling, language modeling, and deep reinforcement learning.

Corresponding author

Conference information

Register with J-STAGE for free!