音声翻訳フレームワークによる吃音音声の自動音声認識に対する課題への取り組み

久保田 なつみ; サクティ サクリアニ

doi:10.11517/jsaislud.103.0_214

103rd (Mar.2025)

DOI https://doi.org/10.11517/jsaislud.103.0_214

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 103rd SIG-SLUD

Number : 103

Location : [in Japanese]

Date : March 20, 2025 - March 22, 2025

Addressing ASR Challenges for Stuttered Speech Through a Speech Translation Framework

Natsumi KUBOTA, Sakti SAKRIANI

Author information

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Pages 214-217

Details

Abstract

Stuttered speech presents significant challenges for automatic speech recognition (ASR) due to its irregular patterns and the scarcity of annotated data. This limitation hinders the development of robust systems capable of accurately recognizing and processing stuttered speech. To address these issues, this study propose a novel approach that leverages text-to-speech (TTS) technology for data augmentation, enabling the synthesis of realistic stuttered speech to supplement existing datasets. Using this augmented data, this study develop an ASR system within a speech translation framework designed to transform stuttered speech into fluent text.

Corresponding author

Conference information

Register with J-STAGE for free!