Deep-Learning-Based Speech Emotion Recognition Using Synthetic Bone-Conducted Speech

Md. Sarwar Hosain; Yosuke Sugiura; Nozomiko Yasui; Tetsuya Shimamura

doi:10.2299/jsp.27.151

Md. Sarwar Hosain, Yosuke Sugiura, Nozomiko Yasui, Tetsuya Shimamura

著者情報

キーワード: deep learning, air-conducted speech, speech emotion recognition, bone-conducted speech

ジャーナルフリー

2023 年 27 巻 6 号 p. 151-163

DOI https://doi.org/10.2299/jsp.27.151

詳細

抄録

Speech emotion recognition has drawn extensive attention in recent years. We propose deep learning (DL)-based speech emotion recognition using synthetic bone-conducted (BC) speech. In our proposed model, air-conducted(AC) speech is transformed to BC speech using an infinite impulse response (IIR) filter. Data augmentation techniques are utilized and the parameters of convolutional neural network (CNN) models are modified to enhance the accuracy of the proposed model. Simulation results demonstrate that the proposed model outperforms the existing models in terms of recognition accuracy for BC speech. The accuracy of the proposed model is 72.50% for BC speech, whereas that of the existing model is 69.83% for AC speech. This is because BC speech can enhance low-frequency components, which is important for recognizing emotions.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）