Synthesis of everyday conversational speech based on fine-tuning with a corpus for speech synthesis

Hiroki Mori; Kota Furukawa

doi:10.1250/ast.e24.35

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

Synthesis of everyday conversational speech based on fine-tuning with a corpus for speech synthesis

Hiroki Mori, Kota Furukawa

著者情報

キーワード: Speech synthesis, everyday conversation, conversational agent, prosody

ジャーナルオープンアクセス早期公開

論文ID: e24.35

DOI https://doi.org/10.1250/ast.e24.35

この記事には本公開記事があります。

The final version of this article is now available: Vol. 46 (2025), No. 1 pp. 103-105

詳細

抄録

In this letter, we propose a separate modeling of prosodic and segmental features for everyday conversational speech synthesis, addressing challenges posed by low-quality recordings in the Corpus of Everyday Japanese Conversation (CEJC). Initially, the FastSpeech 2 model is trained on the conversation corpus and subsequently fine-tuned on a corpus for speech synthesis. Experimental results show that this fine-tuning approach enhances synthesis quality while preserving the nuances of everyday conversations.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）