変分自己符号化器を用いた表現の多様性のモデル化による表現豊かな音声合成

阿久澤 圭; 岩澤 有祐; 松尾 豊

doi:10.11517/pjsai.JSAI2018.0_2N101

32nd (2018)

Session ID : 2N1-01

DOI https://doi.org/10.11517/pjsai.JSAI2018.0_2N101

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 32nd Annual Conference of the Japanese Society for Artificial Intelligence, 2018

Number : 32

Location : [in Japanese]

Date : June 05, 2018 - June 08, 2018

Expressive Speech Synthesis through modeling the variety of expressions by Variational Autoencoder

*Kei AKUZAWA, Iwasawa YUSUKE, Matsuo YUTAKA

Author information

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

Recent advancements in the deep autoregressive generative modeling improve the performance of speech synthesis (SS). However, how to equip the expressiveness into the deep autoregressive based SS-system is an open issue due to the lack of ability to model the global characteristics of speech (such as speaker individualities or speaking styles). In this paper, we propose a model called VAE-Loop, which integrates variational autoencoder (VAE) with VoiceLoop: one of the autoregressive based speech synthesis models. Unlike the traditional SS with autoregressive modeling, the proposed method explicitly model the global characteristic of speech by VAE, enabling control of the expressiveness of the synthesized speech. Experiments on VCTK and Blizzard2012 showed that VAE helps VoiceLoop to generate higher quality speech and control expressions through learning the global characteristics.

Corresponding author

Conference information

Register with J-STAGE for free!