アクティブラーニングによる音声認識モデルのための効率的なデータアノテーション手法

山野 陽祐; 田森 秀明; 杉野 かおり; 黒田 由加

doi:10.11517/pjsai.JSAI2024.0_4Xin222

Abstract

End-to-End speech recognition models are known to perform well when using high-quality training data. However, creating such data typically incurs significant human and management costs. This study proposes a data selection method using active learning to efficiently annotate high-quality training data for speech recognition models. By employing a Character Error Rate (CER) prediction model built using features calculated from speech waveforms, we successfully identified data from the pool that should be annotated preferentially. Furthermore, the speech recognition model developed using our proposed method demonstrated superior performance compared to models trained with randomly annotated data, thereby proving the contribution of our method to the creation of efficient training data. Additionally, our research revealed that efficient labeling in terms of label quality positively influences the psychological aspects of annotators, leading to cost savings and improved accuracy of the speech recognition model.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!