A Covariance-Tying Technique for HMM-Based Speech Synthesis

Keiichiro OURA; Heiga ZEN; Yoshihiko NANKAKU; Akinobu LEE; Keiichi TOKUDA

doi:10.1587/transinf.E93.D.595

Regular Section

A Covariance-Tying Technique for HMM-Based Speech Synthesis

Keiichiro OURA, Heiga ZEN, Yoshihiko NANKAKU, Akinobu LEE, Keiichi TOKUDA

著者情報

キーワード: HMM, speech synthesis, decision tree, context-clustering, MDL criterion, embedded device

ジャーナルフリー

2010 年 E93.D 巻 3 号 p. 595-601

DOI https://doi.org/10.1587/transinf.E93.D.595

詳細

抄録

A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）