Abstract
Emotion of speech degrades the performance of Automatic Speech Recognition (ASR) systems. With the aim of enhancing the emotional speech recognition accuracy, the effects of formant frequencies and their slopes on improving the performance are investigated in this paper. For this purpose, the formant frequencies are neutralized using hybrid of Dynamic Time Warping (DTW) and Multi-Layer Perceptron (MLP) neural networks. Each one of the neutralized formant frequencies is considered as a supplementary feature and used in Hidden Markov Model (HMM)-based ASR. Experimental results show that by using the slope of neutralized formant frequency features, the recognition rate in happiness and anger states is improved by at most 2.1% and 3.6%, respectively.