2017 Volume 56 Issue 2 Pages 157-162
In this article, I give an overview of recent developments of deep learning, focusing on its applications to images. I first introduce new methods for designing and training neural networks that have been proposed since the birth of deep learning and are considered to be standard as of today. I then explain that some recent applications of convolutional neural networks perform even computation similar to global optimization, which is hard to interpret within the concept of classical pattern recognition. Nowadays, owing to these developments, most of visual recognition tasks can be solved by deep neural networks, provided that there is a sufficient amount of training data. Having said that, there is a gap to human vision. To evaluate the gap, I choose and explain a task called VQA (visual question answering), in which, given the image of a scene and a question about it in the form of natural language, we wish to make the computer answer the question. I conclude this article by briefly showing possible future research directions.