This paper derives the cumulative distribution function and probability density function of the continuous transitional distribution of normalized Ordinary Least Squares (OLS) estimator for autoregressive process with local to moderate deviations from unity root. The derivation of these distributions is based on the asymptotic results in Phillips, Magdalinos and Giraitis (2010). The approach is one of first deriving the asymptotic joint characteristic function of the numerator and denominator of the normalized OLS estimator and then inverting it by numerical integration to find these distributions. We present some graphs to illustrate the behavior of smooth transition. We also provide tables of percentile of the limiting distribution of normalized OLS estimator and assess the approximate accuracy of the distribution in detail.
In this paper, we introduce some statistical theories of deep learning toward answering why deep learning works well. In particular, we discuss its function approximation ability and estimation ability, and present that deep learning can adaptively estimate a target function. For that purpose, first we explain its universal approximation ability, and next we discuss the estimation ability for some function classes such as the Barron class and the anisotropic Besov space on which we discuss minimax optimality of estimators. We show that deep learning has favorable properties such as avoiding curse of dimensionality and adaptivity to inhomogeneity of smoothness of the target function unlike linear estimators. Finally, we show a generalization error analysis of deep learning that utilizes the perspective of kernel methods to explain how deep learning can generalize even though the number of parameters is larger than the sample size.
In this paper, we discuss a theory of generalization error, which explains a principle of deep learning. Deep learning is a statistical methodology which utilizes multi-layer neural networks as models, and has been in the spotlight owing to its high performance. Despite its empirical success, due to a complicated structure of the multi-layer models, understanding of the mechanism of its high performance is still a developing problem. This paper focuses on several attempts to describe the principles of the performance, with a particular focus on an approximation error by neural network models and a complexity error from the learning procedure. Further, we discusses several elucidated and unexplained parts of the principal of deep learning.
In this paper, we present recent results on the integral representation of neural networks. In the theoretical study of deep learning, using the integral representation to treat neural nets in a functional analytic manner is developing. However, the structure of the domain space of the integral representation operator S is mostly unknown. It is less known that there even exists an infinite-dimensional null space ker S. In this paper, we consider several problems related to integral representations and discuss the characterizations of and ker S in each context.
In supervised learning, acquiring labeled training data for a predictive model can be very costly, but acquiring a large amount of unlabeled data is often quite easy. Active learning is a method of obtaining predictive models with high precision at a limited cost through the adaptive selection of samples for labeling. This study explains the basic problem settings of active learning and recent research trends. In particular, research on learning acquisition functions to select samples from the data for labeling, theoretical work on active learning algorithms, and stopping criteria for sequential data acquisition are highlighted. Application examples of improved efficiency are introduced for material development and measurement.
Subset selection for regression models has long been recognized as an important task in statistics, and recently it has been actively studied in data mining and machine learning because of the increased amount of data handled. A mixed-integer optimization approach, which has received renewed attention for subset selection, is to formulate subset selection as a mathematical optimization problem and to solve it using a branch-and-bound method. The greatest advantage of this approach is that the best subset of explanatory variables can be selected with respect to the objective function (i.e., evaluation criteria of regression models) of the optimization problem. The authors have devised several formulations for simultaneously optimizing a subset of variables and its cardinality in terms of statistical criteria such as Mallows' Cp, adjusted R2, information criteria, and cross-validation criterion. In this paper, we explain these mixed-integer optimization formulations for best subset selection in linear regression models.
Multiple regression analysis is, without doubt, the most important and extremely useful method of statistical data analysis. It is also a fact, however, that there are many misuses in practical applications of the method. In the present paper, in addition to the textbook description of multiple regression analysis, some issues that seem important in applying the methodology to various practical problems are examined from the viewpoint of statistical causal inference. Some paradoxical examples will be also shown that may be of interest to students attending the class of multiple regression analysis.
Recently, mean-reversion proposals have become more common in the Monte Carlo literature on Markov chains. We review some of them, such as the (mixed) preconditioned Crank–Nicolson kernel, the Beta-Gamma kernel, and the Chi-square kernel. We also review piecewise deterministic Markov processes used in Monte Carlo methods such as the Bouncy particle sampler and the Zig-zag sampler. Finally, we review the boomerang sampler which has been proposed as a mean-reverting version of the bouncing particle sampler.
Multivariate stochastic volatility (MSV) models and their extensions have been widely developed to capture multivariate volatility of returns of asset prices, such as stock prices and foreign exchange rates. As a return distribution of asset prices tends to be skewed, this paper introduces the MSV models with a skew-t distribution. A shrinkage method for skewness parameters is also introduced. An empirical analysis using daily stock returns in the S&P500 is provided to show that the skew-t distribution and the shrinkage method improve predictive ability of the MSV model.