Proposal of a Japanese-speech-synthesis Method with Dimensional Representation of Emotions based on Prosody as well as Voice-quality Conversion

Shoichi TAKEDA; Yoshiki KABUTA; Tomohiro INOUE; Masashi HATOKO

doi:10.5057/ijae.12.79

Abstract

This paper proposes a Japanese speech synthesis system that is capable of expressing variable degrees of emotions based on prosody as well as voice-quality conversion. Among voice-quality features, we find that the spectral tilts depend on the type and degree of emotion. Up to date, we have introduced a spectral-tilt conversion rule into our speech-synthesis system. From our previous analyses, we found that the spectral-tilt quantities increased as the degrees of “anger”, “joy”, and “crying-type (hot) sadness” increased. On the other hand, the spectral-tilt quantities were found to decrease as the degree of “dispirited-and-whispering-type (cold) sadness” increased. We formulate a transfer function that converts spectral-tilt quantities of “neutral” speech to those of emotional speech in various degrees. The prosody-conversion rules are also determined based on our previous findings. Informal listening to synthetic-speech samples converted by the proposed method gives us impressions of those similar to natural emotional speech and the differences depending on the degrees of emotions are recognizable.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!