IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics
Yuji OSHIMAShinnosuke TAKAMICHITomoki TODAGraham NEUBIGSakriani SAKTISatoshi NAKAMURA
Author information
JOURNAL FREE ACCESS

2016 Volume E99.D Issue 12 Pages 3132-3139

Details
Abstract

This paper presents a novel non-native speech synthesis technique that preserves the individuality of a non-native speaker. Cross-lingual speech synthesis based on voice conversion or Hidden Markov Model (HMM)-based speech synthesis is a technique to synthesize foreign language speech using a target speaker's natural speech uttered in his/her mother tongue. Although the technique holds promise to improve a wide variety of applications, it tends to cause degradation of target speaker's individuality in synthetic speech compared to intra-lingual speech synthesis. This paper proposes a new approach to speech synthesis that preserves speaker individuality by using non-native speech spoken by the target speaker. Although the use of non-native speech makes it possible to preserve the speaker individuality in the synthesized target speech, naturalness is significantly degraded as the synthesized speech waveform is directly affected by unnatural prosody and pronunciation often caused by differences in the linguistic systems of the source and target languages. To improve naturalness while preserving speaker individuality, we propose (1) a prosody correction method based on model adaptation, and (2) a phonetic correction method based on spectrum replacement for unvoiced consonants. The experimental results using English speech uttered by native Japanese speakers demonstrate that (1) the proposed methods are capable of significantly improving naturalness while preserving the speaker individuality in synthetic speech, and (2) the proposed methods also improve intelligibility as confirmed by a dictation test.

Content from these authors
© 2016 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top