Host: The Japanese Society for Artificial Intelligence
Name : The 32nd Annual Conference of the Japanese Society for Artificial Intelligence, 2018
Number : 32
Location : [in Japanese]
Date : June 05, 2018 - June 08, 2018
This paper proposes a novel method for training neural networks with a limited amount of training data. Our approach is based on knowledge distillation that transfers knowledge from a deep reference neural network to a shallow target one. The proposed method employs this idea to mimic predictions of non-neural networks reference models that are more robust against overfitting that the target neural network. Different from almost all the previous work for knowledge distillation that requires a large amount of labeled training data, the proposed method requires only a small amount of training data. Instead, we introduce pseudo training data that is optimized as a part of model parameters.