Interdisciplinary Information Sciences

Special Issue

GP-DS Lectures: Statistics, Machine Learning, and Graph Theory for Data Science

Walks: A Beginner's Guide to Graphs and Matrices

Yuki IRIE

2020 年 26 巻 1 号 p. 1-39
発行日: 2020年
公開日: 2020/12/26
[早期公開] 公開日: 2020/08/25

DOIhttps://doi.org/10.4036/iis.2020.A.01

ジャーナルフリー

抄録を表示する抄録を非表示にする

We provide an introduction to graph theory and linear algebra. The present article consists of two parts. In the first part, we review the transfer-matrix method. It is known that many enumeration problems can be reduced to counting walks in a graph. After recalling the basics of linear algebra, we count walks in a graph by using eigenvalues. In the second part, we introduce PageRank by using a random walk model. PageRank is a method to estimate the importance of web pages and is one of the most successful algorithms. This article is based on the author's lectures at Tohoku University in 2018 and 2020.

抄録全体を表示

PDF形式でダウンロード (3031K)
The Elements of Multi-Variate Analysis for Data Science

Mohammad Samy BALADRAM, Nobuaki OBATA

2020 年 26 巻 1 号 p. 41-86
発行日: 2020年
公開日: 2020/12/26
[早期公開] 公開日: 2020/12/08

DOIhttps://doi.org/10.4036/iis.2020.A.02

ジャーナルフリー

抄録を表示する抄録を非表示にする

These lecture notes provide a quick review of basic concepts in statistical analysis and probability theory for data science. We survey general description of single- and multi-variate data, and derive regression models by means of the method of least squares. As theoretical backgrounds we provide basic knowledge of probability theory which is indispensable for further study of mathematical statistics and probability models. We show that the regression line for a multi-variate normal distribution coincides with the regression curve defined through the conditional density function. In Appendix matrix operations are quickly reviewed. These notes are based on the lectures delivered in Graduate Program in Data Science (GP-DS) and Data Sciences Program (DSP) at Tohoku University in 2018–2020.

抄録全体を表示

PDF形式でダウンロード (1980K)
Introduction to Supervised Machine Learning for Data Science

Mohammad Samy BALADRAM, Atsushi KOIKE, Kazunori D YAMADA

2020 年 26 巻 1 号 p. 87-121
発行日: 2020年
公開日: 2020/12/26

DOIhttps://doi.org/10.4036/iis.2020.A.03

ジャーナルフリー

抄録を表示する抄録を非表示にする

We present an introduction to supervised machine learning methods with emphasis on neural networks, kernel support vector machines, and decision trees. These methods are representative methods of supervised learning. Recently, there has been a boom in artificial intelligence research. Neural networks are a key concept of deep learning and are the origin of the current boom in artificial intelligence research. Support vector machines are one of the most sophisticated learning methods from the perspective of prediction performance. Its high performance is primarily owing to the use of the kernel method, which is an important concept not only for support vector machines but also for other machine learning methods. Although these methods are the so-called black-box methods, the decision tree is a white-box method, where the judgment criteria of prediction by the predictor can be easily interpreted. Decision trees are used as the base method of ensemble learning, which is a refined learning technique to improve prediction performance. We review the theory of supervised machine learning methods and illustrate their applications. We also discuss nonlinear optimization methods for the machine to learn the training dataset.

抄録全体を表示

PDF形式でダウンロード (2779K)

J-STAGEへの登録はこちら（無料）