2013 Volume 133 Issue 1 Pages 200-210
Since the earliest studies of human behavior, emotions have attracted attention of researchers in many disciplines, including psychology, neuroscience, and lately computer science. Speech is considered a salient conveyor of emotional cues, and can be used as an important source for emotional studies. Speech is modulated for different emotions by varying frequency- and energy-related acoustic parameters such as pitch, energy, and formants. In this paper, we explore analyzing inter- and intra-subband energy variations to differentiate six emotions. The emotions considered are anger, disgust, fear, happiness, neutral, and sadness. In this research, Two-Layered Cascaded Subband Cepstral Coefficients (TLCS-CC) analysis was introduced to study energy variations within low and high arousal emotions as a novel approach for emotion classification. The new approach was compared with Mel frequency cepstral coefficients (MFCC) and log frequency power coefficients (LFPC). Experiments were conducted on the Berlin Emotional Data Corpus (BECD). With energy-related features, we could achieve average accuracy of 73.9% and 80.1% for speaker-independent and -dependent emotion classification respectively.
The transactions of the Institute of Electrical Engineers of Japan.C
The Journal of the Institute of Electrical Engineers of Japan