Phoneme recognition using time-warping neural networks

Kiyoaki Aikawa

doi:10.1250/ast.13.395

Abstract

This paper proposes a novel neural network architecture for phoneme-based speech recognition. The new architecture is composed of five time-warping sub-networks and an output layer which integrates the sub-networks. Each time-warping sub-network has a different time-warping function embedded between the input layer and the first hidden layer. A time-warping sub-network recognizes the input speech warping the time axis using its time-warping function. The network is called the Time-Warping Neural Network (TWNN). The purpose of this network is to cope with the temporal variability of acoustic-phonetic features. The TWNN demonstrates a higher phoneme recognition accuracy than a baseline recognizer composed of time-delay neural networks with a linear time alignment mechanism.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!