深層学習に基づいた富山弁音声認識とその標準日本語への変換

堀元 優花; 張 逸群; 斎藤 博昭

doi:10.11517/pjsai.JSAI2024.0_2C1GS701

Abstract

Motivated by a deep affection for Toyama, the study focuses on speech recognition of the Toyama dialect. Despite an appreciation for its unique language style, it poses communication challenges with individuals from different regions. Therefore, this study aims to develop a system that converts the Toyama dialect into standard Japanese by speech recognition, facilitating communication for visitors from other areas. We employed wav2vec 2.0 for the speech recognition model and used two GPT-2 models for standard Japanese conversion model. We created a Toyama dialect corpus and enhanced its quality via meticulous transcription. All speech data underwent smoothing via RMS and data augmentation through masking during the training. In the experiments, we employed CER and WER as automatic evaluations, and the human evaluations focused on semantic equivalence and grammaticality. Empirical studies demonstrate that our model outperformed the baseline, and the effectiveness of our approach is verified in the discussion.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!