Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
32nd (2018)
Session ID : 2N1-01
Conference information

Expressive Speech Synthesis through modeling the variety of expressions by Variational Autoencoder
*Kei AKUZAWAIwasawa YUSUKEMatsuo YUTAKA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Recent advancements in the deep autoregressive generative modeling improve the performance of speech synthesis (SS). However, how to equip the expressiveness into the deep autoregressive based SS-system is an open issue due to the lack of ability to model the global characteristics of speech (such as speaker individualities or speaking styles). In this paper, we propose a model called VAE-Loop, which integrates variational autoencoder (VAE) with VoiceLoop: one of the autoregressive based speech synthesis models. Unlike the traditional SS with autoregressive modeling, the proposed method explicitly model the global characteristic of speech by VAE, enabling control of the expressiveness of the synthesized speech. Experiments on VCTK and Blizzard2012 showed that VAE helps VoiceLoop to generate higher quality speech and control expressions through learning the global characteristics.

Content from these authors
© 2018 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top