Synthesis of everyday conversational speech based on fine-tuning with a corpus for speech synthesis

Hiroki Mori; Kota Furukawa

doi:10.1250/ast.e24.35

抄録

In this letter, we propose a separate modeling of prosodic and segmental features for everyday conversational speech synthesis, addressing challenges posed by low-quality recordings in the Corpus of Everyday Japanese Conversation (CEJC). Initially, the FastSpeech 2 model is trained on the conversation corpus and subsequently fine-tuned on a corpus for speech synthesis. Experimental results show that this fine-tuning approach enhances synthesis quality while preserving the nuances of everyday conversations.

著者関連情報

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nd/4.0/

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

総説と展望
Cooperative oxide-ion conduction in apatite-type lanthanum germanate—A first principles study
1A1-A3 Development of off-road Vehicle "HELIOS-VI" : Sequence of development and design of new mechanism
Venous Hemangioma of the Temporalis Muscle —Case Report—
Cervical Laminoplasty: The History and the Future

前身誌

Journal of the Acoustical Society of Japan (E)

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）