Audio-Visual Speech Recognition Using Convolutive Bottleneck Networks for a Person with Severe Hearing Loss

Yuki Takashima; Yasuhiro Kakihara; Ryo Aihara; Tetsuya Takiguchi; Yasuo Ariki; Nobuyuki Mitani; Kiyohiro Omori; Kaoru Nakazono

doi:10.2197/ipsjtcva.7.64

Yuki Takashima, Yasuhiro Kakihara, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki, Nobuyuki Mitani, Kiyohiro Omori, Kaoru Nakazono

Author information

Keywords: multimodal, lip reading, deep-learning, assistive technology

JOURNAL FREE ACCESS

2015 Volume 7 Pages 64-68

DOI https://doi.org/10.2197/ipsjtcva.7.64

Details

Abstract

In this paper, we propose an audio-visual speech recognition system for a person with an articulation disorder resulting from severe hearing loss. In the case of a person with this type of articulation disorder, the speech style is quite different from with the result that of people without hearing loss that a speaker-independent model for unimpaired persons is hardly useful for recognizing it. We investigate in this paper an audio-visual speech recognition system for a person with severe hearing loss in noisy environments, where a robust feature extraction method using a convolutive bottleneck network (CBN) is applied to audio-visual data. We confirmed the effectiveness of this approach through word-recognition experiments in noisy environments, where the CBN-based feature extraction method outperformed the conventional methods.

Corresponding author

Register with J-STAGE for free!