2021 Volume 50 Issue 2 Pages 229-256
In this paper, we introduce some statistical theories of deep learning toward answering why deep learning works well. In particular, we discuss its function approximation ability and estimation ability, and present that deep learning can adaptively estimate a target function. For that purpose, first we explain its universal approximation ability, and next we discuss the estimation ability for some function classes such as the Barron class and the anisotropic Besov space on which we discuss minimax optimality of estimators. We show that deep learning has favorable properties such as avoiding curse of dimensionality and adaptivity to inhomogeneity of smoothness of the target function unlike linear estimators. Finally, we show a generalization error analysis of deep learning that utilizes the perspective of kernel methods to explain how deep learning can generalize even though the number of parameters is larger than the sample size.