Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 2L1-OS-9a-03
Conference information

Comparative Analysis of Learning Data Augmentation Techniques for Speech Emotion Recognition
*Kazuya MERATsuyoshi SAKANEYoshiaki KUROSAWATakezawa TOSHIYUKI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Machine learning-based Speech Emotion Recognition (SER) and Emotional Speech Synthesis have gained increasing popularity recently. However, preparing sufficient learning data that perfectly matches the intended use is challenging. One method to increase data volume is “data augmentation.” Various data augmentation methods are proposed in the fields of Automatic Speech Recognition (ASR) and Image Recognition (IR). This paper proposes increasing learning data through data augmentation methods from the ASR and IR fields. Five data augmentation techniques (Time Stretch, Frequency Masking, Time Masking, Frequency Warping, Low-latency Low-resource Voice Conversion (LLVC), and CopyPaste) are applied to machine learning data for SER and their effectiveness is compared. The experimentation results indicated that applying multiple data augmentation methods enhanced the performance of SER. Particularly, the combination of LLVC and CopyPaste improved the SER accuracy by 0.24 points from the baseline.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top