日本音響学会誌
Online ISSN : 2432-2040
Print ISSN : 0369-4232
母音・半母音・有声破裂音の知覚におけるホルマント遷移の変化量と変化速度との間の相補性およびその識別機構 (&lt特集&gt聴覚)
鈴木 久喜
著者情報
ジャーナル フリー

1974 年 30 巻 3 号 p. 169-180

詳細
抄録

Motion of Formant Frequency is often as essential cue for distinguishing phonemes in speech sounds. The purpose of this research is to examine the perceptual effect of change in amount and rate of formant frequency change and a signal processing in auditory system. Synthetic word-like sounds whose first formant frequency was changed systematically as shown in Fig. 8 and eq. (3. 1), whereas the other formants were kept constant, were used as the experimental material. Analysis of 7 listeners' responses in the hearing test of the word-like sounds shows the following findings : 1. These sounds are listened as one of /aba/, /awa/, /aua/, and /aaa/. 1. The perception of a stop consonant is influenced by the rate of frequency change and is relatively insensitive to change in the length of the central constant interval. The perception of semi-vowel and vowel, on the contrary, is influenced largely by the length of the central constant interval as well as the rate of frequency change. 3. If the rate of the upward transition is different from that of downward, a perceptual rate of formant motion is to be expected at some value between the upward and the downward rates. The upward transition has larger influence in the perception of stop consonant than the downward transition. 4. An increase in the rate of first formant frequency change reduces the amount of frequency change required to switch the identification to a particular consonant. With regard to this there is a mutually complementary effect between the amount and the rate of formant change. 5. The close examination for phoneme boundaries between /b/, /w/, /u/, and /a/ in these speech like sounds shows that the loci of phoneme boundaries obtained from listening tests can be represented by some hyperbolic curve in (⊿F, ⊿F) plane as follows ; (⊿F - ⊿F_θ) = C / (⊿F - ⊿F_θ) . . . . . . (4. 1) where ⊿F is the amount of transition of first formant frequency, ⊿F is the rate of transition of first formant frequency, ⊿F_θ, ⊿F_θ and C are constants depending upon which phonemes concern with. The fact that phoneme boundaris are given by eq. (4. 1) suggests that phonemes can be discriminated by extracting ⊿F, ⊿F and making a decision of whether the inquality : {(⊿F - ⊿F_θ) (⊿F - ⊿F_θ) - C } &gt 0 . . . . . . (6. 3) is satisfied. An operation for the decision must be also described in terms of "processed" signals in auditory domain something like a physiological or psychological quantity, and is supposed to be simpler in such a domain than in acoustic domain. A processor whose operation is defined by : f(t) = F(t) + 1 / 2R_M Σ^^R_M_ _R = 1 {K_1 exp (-T_c ・R) (F(t) - F(t - R)) + K_2 exp (-T_C・R) (F(t) -F(t - R))} . . . . . . (6. 5) has been postulated as a conversion for speech perception from a signal F(t) as a function of time t in acoustic domain to a signal f(t) in auditory domain. Effects of change in time span R_M, time constant T_C, and rate of transition of formant frequency in the input signal were examined by computer simulation. When the parameters have such values as : R_M = 8 (≒ 80 msec), T_C = 0. 15~0. 08 (≒ 100 mces), and K_1 = K_2 = 2, the phoneme boundary between /w/ and /b/ takes quite square shape in (⊿f, ⊿f - plane. From those findings, it is concluded that phoneme discrimination can be made for the processed signals by two simple threshold logics, one of which is for ⊿f and the other for ⊿f, instead of calculating rather complex ineq. (6. 3), and that the processor has short time memory of about 2R_M ( ≒ 160 msec), delayed response of about R_M (≒ 80 msec), and time constant T_C of about 100 msec.

著者関連情報
© 1974 一般社団法人 日本音響学会
前の記事 次の記事
feedback
Top