Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
PAPERS
SUST TTS Corpus: A phonetically-balanced corpus for Bangla text-to-speech synthesis
Arif AhmadMd. Reza SelimMd. Zafar IqbalM. Shahidur Rahman
Author information
JOURNAL FREE ACCESS

2021 Volume 42 Issue 6 Pages 326-332

Details
Abstract

This paper presents the Shahjalal University of Science and Technology Text-To-Speech Corpus (SUST TTS Corpus), a phonetically balanced speech corpus for Bangla speech synthesis. Due to the advancement of deep learning techniques, modern speech processing researches such as speech recognition and speech synthesis are being conducted in various deep learning methods. Any state-of-the-art neural TTS system needs a large dataset to be trained efficiently. The lack of such datasets for under-resourced languages like Bangla is a major obstacle for developing TTS systems in those languages. To mitigate this problem and accelerate speech synthesis research in Bangla, we have developed a large-scale, phonetically-balanced speech corpus containing more than 30 hours of speech. Our corpus includes 17,357 utterances spoken by a professional voice talent in a sound-proof audio laboratory. We ensure that the corpus contains all possible Bangla phonetic units in sufficient amounts, making it a phonetically-balanced speech corpus. We describe the process of creating the corpus in this paper. We also train a neural Bangla TTS system with our corpus and obtain a synthetic voice which is comparable to the state-of-the-art TTS systems.

Content from these authors
© 2021 by The Acoustical Society of Japan
Previous article Next article
feedback
Top