Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
PAPERS
Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis
Tetsuya MatsudaKeikichi HiroseNobuaki Minematsu
Author information
JOURNAL FREE ACCESS

2012 Volume 33 Issue 4 Pages 221-228

Details
Abstract
Speech synthesis based on hidden Markov models (HMMs) processes both segmental and prosodic features of speech together in a frame-by-frame manner. One benefit of this method is that time alignment of both features is kept automatically. However, when the training data are limited, frame-by-frame representation is not appropriate for prosodic features, which tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. A method is developed to modify F0 contours in the framework of generation process model (henceforth, F0 model) by referring to linguistic information of input text (word boundary and accent type). It takes F0 variances obtained through HMM-based speech synthesis into account during the process. Through a listening experiment on synthetic speech, the method is proved to generate better quality as compared to the HMM-based speech synthesis on average. Since the F0 model can clearly relate its commands and linguistic (and para-/non- linguistic) information, the method has an additional advantage; changing speech styles, and/or adding further information (such as emphasis) can be easily done through manipulating the commands.
Content from these authors
© 2012 by The Acoustical Society of Japan
Previous article Next article
feedback
Top