Proposal of a Japanese-speech-synthesis Method with Dimensional Representation of Emotions based on Prosody as well as Voice-quality Conversion

Shoichi TAKEDA; Yoshiki KABUTA; Tomohiro INOUE; Masashi HATOKO

doi:10.5057/ijae.12.79

抄録

This paper proposes a Japanese speech synthesis system that is capable of expressing variable degrees of emotions based on prosody as well as voice-quality conversion. Among voice-quality features, we find that the spectral tilts depend on the type and degree of emotion. Up to date, we have introduced a spectral-tilt conversion rule into our speech-synthesis system. From our previous analyses, we found that the spectral-tilt quantities increased as the degrees of “anger”, “joy”, and “crying-type (hot) sadness” increased. On the other hand, the spectral-tilt quantities were found to decrease as the degree of “dispirited-and-whispering-type (cold) sadness” increased. We formulate a transfer function that converts spectral-tilt quantities of “neutral” speech to those of emotional speech in various degrees. The prosody-conversion rules are also determined based on our previous findings. Informal listening to synthetic-speech samples converted by the proposed method gives us impressions of those similar to natural emotional speech and the differences depending on the degrees of emotions are recognizable.

著者関連情報

お気に入り & アラート

閲覧履歴

前身誌

Kansei Engineering International Journal

KANSEI Engineering International

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）