Speech Replica Based on the Connection Model

Hideki YOSHIDA; Kazutomo YUNOKUCHI; Masahiro NAKANO; Toru YUKIMASA

doi:10.24466/ijbschs.21.2_1

抄録

The simplest structure of the universal function for generating synthetic speeches is a matter of considerable interest in this study. Even an inverse approach to extract speech information just like the auditory organ, but not the conventional filter model to reproduce multiple resonances based on the articulatory organ, has yielded more than 90 % of intelligibility and also 59 kbits/s of extraction rate, yet remaining the quality of syntheses mediocre. These results are interpreted as emphasized evidence that envelope maxima as well as maximal points in a bandpass filtered waveform play an essential role in the preservation of intelligibility and the timbre of synthetic speech, respectively. While the connection model of one-term cosine functions with linear approximation for both slowly-varying amplitude envelope and periodicity of inner fine structure contributes to lower spectral bands of speech synthesis, rapid and exceptional variation in higher spectral bands ( > 1kHz) causes degradation, which can be estimated by an index of phase error in advance.

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）