抄録
The simplest structure of the universal function for generating synthetic speeches is a matter of considerable interest in this study. Even an inverse approach to extract speech information just like the auditory organ, but not the conventional filter model to reproduce multiple resonances based on the articulatory organ, has yielded more than 90 % of intelligibility and also 59 kbits/s of extraction rate, yet remaining the quality of syntheses mediocre. These results are interpreted as emphasized evidence that envelope maxima as well as maximal points in a bandpass filtered waveform play an essential role in the preservation of intelligibility and the timbre of synthetic speech, respectively. While the connection model of one-term cosine functions with linear approximation for both slowly-varying amplitude envelope and periodicity of inner fine structure contributes to lower spectral bands of speech synthesis, rapid and exceptional variation in higher spectral bands ( > 1kHz) causes degradation, which can be estimated by an index of phase error in advance.