日本音響学会誌
Online ISSN : 2432-2040
Print ISSN : 0369-4232
30 巻 , 3 号
選択された号の論文の9件中1~9を表示しています
  • 氏原 淳一, 境 久雄
    原稿種別: 本文
    1974 年 30 巻 3 号 p. 133-143
    発行日: 1974/03/01
    公開日: 2017/06/02
    ジャーナル フリー
    The feature of the signal processing of auditory nervous system for mono-syllabic vowels has been investigated by means of an auditory electronic model. The model consists of pre-emphasis, basilar membrane, hair cells, primary and secondary neurons (Fig. 1). Each neuron has latral connection so as to have response area and inhibitory areas observed in physiological experiment (Figs. 10, 18). (1) The response of the neurons for vowels shows a pattern connecting several peaks. The characteristic frequency (CF') of the neurons located at the neurons located at the peaks of the pattern approximately corresponds with the formant frequency of vowels (Figs. 2, 3, 4). When the adjacent formant frequencies approach beyond the frequency resolution of the nervous system, the response changes from bimodal to unimodal (Figs. 4, 6, 7). (2) The rippled waveform pulsating with pitch period is transmitted in the neurons with much higher CF than pitch frequency (Figs, 4, 5, 8, 9, 10, 11, 12, 13). The reason is because inhibition is not effective for the AC component in response. Therefore, pitch-information strikingly appears in the secondary neurons in spite of narrowing frequency characteristic band due to lateral inhibition. It is very different property from ordinary frequency analyzer (Fig. 14). (3) Response of the neurons is produced as the result of the mutual inhibitory action between formant components. In this case, the inhibition due to the lower formant frequency component works effectively, owing to the unsymmetry of inhibitory area of the neurons and the decrease of the higher frequency component in speech sounds (Figs. 19, 20). (4) Two kinds of the frequency emphasis were evaluated by comparison as preprocessing. Pre-emphasis (I) was provided so as to get approximately uniform output in hair cells for speech sounds in order to make up for the insufficient characteristic of intensity range (Fig. 16, PE (I)), and Pre-emphasis (II) was set so as to get equal output of nervous system for input signal whose frequency characteristic is similar to the equal loudness curve of 30 phons (Fig. 16, PE (II)). In case of the Pre-emphasis II, the inhibition due to the lower frequency component decreases (Fig. 17), and in the response to vowels, the difference between /u/ and /o/ is noticed clearly (Fig. 20). In addition, the response to pitch component decreases or does not appear (Fig. 20). Considering from the result mentioned above and (2), it is suggested that the pitch-information is processed rather as temporal information than spatial information. (5) It is verified that the vocal sounds of five vowel are characterized by peak positions of the response pattern, from the investigation into the variation of the response to ten speakers.
  • 難波 精一郎, 桑野 園子, 加藤 徹
    原稿種別: 本文
    1974 年 30 巻 3 号 p. 144-150
    発行日: 1974/03/01
    公開日: 2017/06/02
    ジャーナル フリー
    In order to understand the temporal characteristics of hearing, it is quite important to investigate the effects of various temporal factors of stimuli. We may mention 'rise time of stimuli' as one of the factors. Several experiments have been conducted concerning the effects of rise time on the loudness, but the results have not always been conclusive. Our previous experiments concerning level-fluctuating noises show that loudness is determined by the energy which the noises have. Therefore it will be necessary also to investigate the effect of rise time in respect of energy. So, in this experiment we investigate the effect of rise time on the loudness using such stimuli as shown in Fig. 1-c. As a stimulus white noise is used and it is presented monaurally to subjects, seated alone in the sound-proof room. All conditions of stimuli-intensity, duration, rise time and stimulus interval-are automatically controlled and presented by Programmable Sound Control System (abbreviated as PSCS), which has been developed in our laboratory. Three levels of intensity, 60, 75 &amp 90 dB SPL and 3 kinds of rise time, 100, 300 &amp 500 msec are used as standard stimulus (Ss), and as comparison stimulus (Sc) a steady state noise is used which has the same SPL or the same duration as Ss. In Exp. 1 the duration of Sc is held constant and subjects match the loudness of Sc with that of Ss by controlling the SPL of Sc. In Exp. 2 and 3 the SPL of Sc is held constant and the duration is varied by experimenter in Exp. 2 (method of limits) and by subjects themselves in Exp. 3 (method of adjustment). The result shows that in all conditions PSE is in good agreement with Ss when compared after both PSE and Ss are converted into energy. Therefore it may be concluded that the effect of rise time on the loudness is attributable to the change of energy. But there is as invariable, though slight, difference among PSE obtained in Exp. 1, 2 and 3. That is to say, PSE obtained in Exp. 2 and 3 is louder than that in Exp. 1 in all conditions. This may be due to the change of attribute of sensation since the duration of Sc in Exp. 2 and 3 is very short. Supplementary experiment concerning the change of the attribute of sensation of short noises using semantic differential shows that very short noise is judged sharp, though soft. Therefore when the duration of Sc is varied, there is a possibility that the judgment of loudness may be affected by the impression of sharpness, causing the overestimation of Ss in Exp. 2 and 3.
  • 中林 克巳
    原稿種別: 本文
    1974 年 30 巻 3 号 p. 151-160
    発行日: 1974/03/01
    公開日: 2017/06/02
    ジャーナル フリー
    This paper describes the sound localization on the horizontal plane. Fig. 1 shows the Loudspeaker arrangement for the experiments of directional hearing made. An observer sits on a chair with a headrest as shown in Fig. 2. The observer's head is slightly supported by the headrest. The observer gives the perceived direction of sound source presented. The results of the experiments of the directional hearing for these one-octave band noises can be divided into two groups (Fig. 3, 61%, and Fig. 4, 35%). The perceived direction of the observers of Fig. 3 is influenced by the signal frequency and the sound pressure level but that observers of Fig. 4 is not influenced by these factors. Table 1 shows the percentage of observers for each signal frequency and each judgement type. It is noteworthy that only the 8k - 16kHz one-octave band noise gives the correct judgement (74%). As for other four signals, the misjudgements are influenced by the signal frequency and the sound pressure level. As for the phantom sound source (see Fig. 7), the directional hearing for the one-octave band noise is shown in Fig. 8 and there occur many misjudgements. It is also noteworthy that the phantom sound source in the direction of 45゜ - 135゜ are not perceived in the direction of 45゜ - 135゜ but in the direction of 45゜ or 135゜ and that this phenomenon still occurs even if the band width of the signal becomes two octaves (Fig. 9). In order to find the factor which is effective for the correct judgement, three additional experiments are tried, the directional hearing of the real sound source, of two-octave band noise (Fig. 5), and of one-octave band noise with a certain amount of 8k - 16kHz one-octave band noise (Fig. 10), and the male voice cut off by a Low-Pass-Filter (Fig. 6). By these experiments, it may be said that the sufficient amount of the component of 8k - 16kHz, and widening of signal band are effective for the correct judgement and that the former is far more effective for the correct judgement and that the former is far more effective than the latter. Table 2 shows the percentage of observers of each one-octave band noise whose perceived direction is influenced by the sound pressure level. The ratio of observers whose perceived direction is influenced at least by two kinds of one-octave band noise is 53%. The relationship between the perceived direction and the sound pressure level is shown in Fig. 11 as a typical example. The perceived direction is determined independently of the loudspeaker direction. The last problem is the perceived direction of phantom sound source in the direction 45゜ - 135゜. This phenomenon can be explained by the calculation of ⊿P and ⊿ψ, where⊿P is the difference of the sound pressure level and ⊿ψ is the difference of the phase at the entrance of two external auditory canals (See Fig. 12 and Eq. 1 - Eq. 8). The results mentioned above are under the condition that the observers heard the signals in an echoless chamber with their head supported. Fig. 13 shows the relative frequencies of the directional hearing of 1/3-octave bands noise under the ordinary condition, without a headrest and in a laboratory room There still occur many misjudgements. These results and discussions can be applied to the recording technique of the 4-channel stereo.
  • 中山 剛, 三浦 種敏, 上坂 脩, 佐藤 栄治
    原稿種別: 本文
    1974 年 30 巻 3 号 p. 161-168
    発行日: 1974/03/01
    公開日: 2017/06/02
    ジャーナル フリー
    To investigate the subjective effect of the inter-channel phase difference at the time of hearing synthesized sources, eight loudspeakers were arranged in an anechoic chamber as shown in Fig. 1. Using only the speakers at BL, FL, FR, BR, one adjacent pair, for example FL-BL, were equally excited, but only the phase difference was viried to be 22. 5゜, 45゜, 90゜ and 180゜. The subjects were asked to report the effect upon the bearing-angle width (so-called optic angle) of sound image, their subjective distance from the sound image and angle of elevation of the sound image from a horizontal plane, assuming location at the midpoint of the loudspeakers (L for the FL-BL pair, for example). They were also asked to evaluate the quality of aural sensation, ranging from no change in the quality of the sound image, through small change in the quality, change in the quality causing some feeling of oppression (oppressive effect), up to the judgment of extremely oppressive sensation. Fig. 6 shows the rating of the oppresive sensation for actual sources in various direction. It serves as a standard for the comparison of the oppressive sensation rated when the phase difference exists for the various loudspeaker pairs as shown in Fig. 7. Fig. 8 shows the standard response (actual sound source) regarding sound image width (optic angle), and this should be compared with Fig. 9 showing the effect of the phase difference for the synthesized sources. Data in Fig. 9 are median values for the judgment of width, while the height of ordinates shows the statistical deviation in the judgment of width expressed as interquartile range. The subjective judgment of distance to the sound image was also obtained. The standard response to the actual sound sources is shown in Fig. 10, while the data for the synthesized source with two loudspeakers are shown in Fig. 11. The angle of elevation of the sound image from the horizontal plane results in as shown in Fig. 12. As conclusion, it became clear that the allowable limit of the phase difference is 45゜ in every cases, and that the subjective effects are maximum in front-quadrant and minimum in side-quadrants.
  • 鈴木 久喜
    原稿種別: 本文
    1974 年 30 巻 3 号 p. 169-180
    発行日: 1974/03/01
    公開日: 2017/06/02
    ジャーナル フリー
    Motion of Formant Frequency is often as essential cue for distinguishing phonemes in speech sounds. The purpose of this research is to examine the perceptual effect of change in amount and rate of formant frequency change and a signal processing in auditory system. Synthetic word-like sounds whose first formant frequency was changed systematically as shown in Fig. 8 and eq. (3. 1), whereas the other formants were kept constant, were used as the experimental material. Analysis of 7 listeners' responses in the hearing test of the word-like sounds shows the following findings : 1. These sounds are listened as one of /aba/, /awa/, /aua/, and /aaa/. 1. The perception of a stop consonant is influenced by the rate of frequency change and is relatively insensitive to change in the length of the central constant interval. The perception of semi-vowel and vowel, on the contrary, is influenced largely by the length of the central constant interval as well as the rate of frequency change. 3. If the rate of the upward transition is different from that of downward, a perceptual rate of formant motion is to be expected at some value between the upward and the downward rates. The upward transition has larger influence in the perception of stop consonant than the downward transition. 4. An increase in the rate of first formant frequency change reduces the amount of frequency change required to switch the identification to a particular consonant. With regard to this there is a mutually complementary effect between the amount and the rate of formant change. 5. The close examination for phoneme boundaries between /b/, /w/, /u/, and /a/ in these speech like sounds shows that the loci of phoneme boundaries obtained from listening tests can be represented by some hyperbolic curve in (⊿F, ⊿F) plane as follows ; (⊿F - ⊿F_θ) = C / (⊿F - ⊿F_θ) . . . . . . (4. 1) where ⊿F is the amount of transition of first formant frequency, ⊿F is the rate of transition of first formant frequency, ⊿F_θ, ⊿F_θ and C are constants depending upon which phonemes concern with. The fact that phoneme boundaris are given by eq. (4. 1) suggests that phonemes can be discriminated by extracting ⊿F, ⊿F and making a decision of whether the inquality : {(⊿F - ⊿F_θ) (⊿F - ⊿F_θ) - C } &gt 0 . . . . . . (6. 3) is satisfied. An operation for the decision must be also described in terms of "processed" signals in auditory domain something like a physiological or psychological quantity, and is supposed to be simpler in such a domain than in acoustic domain. A processor whose operation is defined by : f(t) = F(t) + 1 / 2R_M Σ^^R_M_ _R = 1 {K_1 exp (-T_c ・R) (F(t) - F(t - R)) + K_2 exp (-T_C・R) (F(t) -F(t - R))} . . . . . . (6. 5) has been postulated as a conversion for speech perception from a signal F(t) as a function of time t in acoustic domain to a signal f(t) in auditory domain. Effects of change in time span R_M, time constant T_C, and rate of transition of formant frequency in the input signal were examined by computer simulation. When the parameters have such values as : R_M = 8 (≒ 80 msec), T_C = 0. 15~0. 08 (≒ 100 mces), and K_1 = K_2 = 2, the phoneme boundary between /w/ and /b/ takes quite square shape in (⊿f, ⊿f - plane. From those findings, it is concluded that phoneme discrimination can be made for the processed signals by two simple threshold logics, one of which is for ⊿f and the other for ⊿f, instead of calculating rather complex ineq. (6. 3), and that the processor has short time memory of about 2R_M ( ≒ 160 msec), delayed response of about R_M (≒ 80 msec), and time constant T_C of about 100 msec.
  • 印東 太郎
    原稿種別: 本文
    1974 年 30 巻 3 号 p. 181-188
    発行日: 1974/03/01
    公開日: 2017/06/02
    ジャーナル フリー
  • 今井 秀雄
    原稿種別: 本文
    1974 年 30 巻 3 号 p. 189-192
    発行日: 1974/03/01
    公開日: 2017/06/02
    ジャーナル フリー
  • 安野 友博, 井上 靖二, 大山 孜郎
    原稿種別: 本文
    1974 年 30 巻 3 号 p. 193-201
    発行日: 1974/03/01
    公開日: 2017/06/02
    ジャーナル フリー
  • 中島 博美
    原稿種別: 本文
    1974 年 30 巻 3 号 p. 202-206
    発行日: 1974/03/01
    公開日: 2017/06/02
    ジャーナル フリー
feedback
Top