Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
End-to-End speech recognition models are known to perform well when using high-quality training data. However, creating such data typically incurs significant human and management costs. This study proposes a data selection method using active learning to efficiently annotate high-quality training data for speech recognition models. By employing a Character Error Rate (CER) prediction model built using features calculated from speech waveforms, we successfully identified data from the pool that should be annotated preferentially. Furthermore, the speech recognition model developed using our proposed method demonstrated superior performance compared to models trained with randomly annotated data, thereby proving the contribution of our method to the creation of efficient training data. Additionally, our research revealed that efficient labeling in terms of label quality positively influences the psychological aspects of annotators, leading to cost savings and improved accuracy of the speech recognition model.