It is considered that the wall vibration of vocal tract, or wall impeadance z_w=r_s+jωl_s in an equivalent circuit, contributes to the closed reasonance frequentry of vocal tract F_<w0>, formant bandwidth B_i and formant frequency F_i (i=1, 2, …). However, the value of z_w has not been fixed yet, and z_w has been frequently simplified or omited in the transformation between the cross-sectional area function of vocal tract and the speech signal. The reason stems from that we had not obtained z_w by direct measurement, and had employed z_w to fit one or two physical features of speech. Table 1 shows z_w proposed or employed in historical works. The purpose of this article is to understand the contribution of z_w to physical feature of speech and to obtain a reasonable z_w by a simple procedure. The speech production system is illustrated in Fig. 1. The vocal tract is divided into acoustic tubes with equal length. Each tube is represented with an equivalent circuit illustrated in Fig. 2 and Eq. (2). In the caluculation of the transfer function of vocal tract from the equivalent circuit, it is assumed that the glottis is closed. F_i is determined from the frequency whose phase is alternated. Seventeen area function are prepared for the estimation of F_<i, p> and B_<i, p> (p=1 or 3. 9atm). r_s is varied from 0 to 10000, and l_s from 0. 2 to 3. 8. Fig. 3 shows the relation between z_w and F_<w0> calculated as a resonance frequency of Helmholtz resonator. The relation between B_<1, 1> and z_w of a uniform tube is shown in Fig. 4. Fig. 5 shows the same relation at 3. 9 atm. dF defined by Eq. (4) means the upward shift rate of the first formant frequency referring to the lossless vocal tract. Fig. 6 shows the relation between dF and z_w of three area functions. F_<i, p>, the formant frequency of speech uttered under p atm, supward transposed in comparison with F_<i, 1>, that in the normal air, and it is represented in Eq. (5). This equation is well fit with the experimental result by the authors, and the experiment shows that F_<w0> is 195Hz. △F, the frequency difference of F_<1, p> and F_<1, 1>, is caluculated for a uniform tube having various z_w and shown in Fig. 7. The range of r_s and l_s can be speculated in consideration of Figs. 3, 4, 5 and 7. But it is almost impossible to determine the reasonable z_w. F_<1, 3. 9> calculated by Eq. (5) with F_<w0>=195Hz is compared with F_<1, 3. 9> calculated from seventeen area functions with various z_w. The difference is estimated by the mean square error, and z_w=1400+jω1. 6 shows the least error (this z_w is called z_<ws> hereinafter). Fig. 8 shows the relation between F_<1, 3. 9> and F_<1, 1> calculated by Eq. (5) and from area functions with z_<ws>. z_<ws> is comparatively close to z_w measured directlyby Ishizaka et al (see Table 1). On the other hand, Sweep tone method shows that F_<w0> is in the range between 150 and 200Hz. Applying z_<w0> to the area function drawn in Fig. 3, F_<w0> becomes 177Hz. B_i calculated from seventeen area functions with z_<ws> are shown in Fig. 10. B_1 in this figure well fit with the bandwidth obtained by Sweep tone method. Table 3 shows F_i and B_i(i=1, 2, 3)calculated from area functions drawn in Fig. 10. Two kinds of z_w are employed in this calculation. It indicates that if historical z_w=6500+jω0. 4 is used, low F_1 and wide B_1 are obtained. Fig. 11 is the formant pattern as a function of the position of constriction in a uiform tube with two kinds of z_w. This figure suggests that the inadequate z_w will give the wrong place of articulation in the transformation from speech signal to area function. It is conclusively said that
(View PDF for the rest of the abstract.)