Using neutralized formant frequencies to improve emotional speech recognition

Davood Gharavian; Mansour Sheikhan; Farhad Ashoftedel

doi:10.1587/elex.8.1155

Abstract

Emotion of speech degrades the performance of Automatic Speech Recognition (ASR) systems. With the aim of enhancing the emotional speech recognition accuracy, the effects of formant frequencies and their slopes on improving the performance are investigated in this paper. For this purpose, the formant frequencies are neutralized using hybrid of Dynamic Time Warping (DTW) and Multi-Layer Perceptron (MLP) neural networks. Each one of the neutralized formant frequencies is considered as a supplementary feature and used in Hidden Markov Model (HMM)-based ASR. Experimental results show that by using the slope of neutralized formant frequency features, the recognition rate in happiness and anger states is improved by at most 2.1% and 3.6%, respectively.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!