Host: The Japan Society of Mechanical Engineers
Name : [in Japanese]
Date : May 27, 2020 - May 30, 2020
Recently, sensing technology has been dramatically developed. Along with this, a wide variety of sensors have been used in a system such as automated driving technology and the robot technology. When humans recognize the environment, the information of five senses are transmitted to the sensory area in the cerebrum and processed. After that, the processed information are transmitted to the association area and fusion. Also in the robot sensor fusion, it is expected such a human sensor fusion system. In this study, we propose the system that word recognition is fusion by combining Visual Data from a RGB camera and Voice Data from a microphone using CNN that can automatically extract features. We train the network using Visual Data and Voice Data, and verify the accuracy of the system by the word recognition rate, and the possibility of sensor fusion by deep learning.