When one of the dominant harmonics (the fundamental frequency and its harmonic components) is close to the first formant frequency, the effect of the source-filter interaction can induce voice register transition, in which the vocal-fold vibration becomes unstable and the pitch jumps abruptly. We investigated the relationship between the dominant harmonics, the first formant frequency, and the pitch jump width in the modal-falsetto transition to examine the effect of source-filter interaction. We measured temporal patterns of the fundamental frequency and the first formant when subjects performed rising glissandi with /a/ and /i/ vowels. For the /a/ vowel, there were weak proximity relationships between the dominant harmonics and first formant during the transition, indicating that source-induced transition occurred. For the /i/ vowel, in contrast, the fundamental frequency was regularly close to the first formant in the transition, indicating that the acoustically induced transition was caused by the source-filter interaction. Additionally, it was found that the difference between these two mechanisms had little influence on the pitch jump width. Finally, we concluded that the source-filter interaction is a contributory factor of the modal-falsetto transition, in agreement with foregoing studies.
Regarding scattering coefficient that represents acoustic scattering capability of architectural surfaces, a reverberation room method for measuring random-incidence values has been standardized in ISO 17497-1, whereas any method has not yet fully established for incidence-angle-dependent values. In this paper, a laboratory measurement method is proposed for normal-incidence scattering coefficients, which will be useful for room acoustics design, particularly assessment of flutter echoes. The measurement is performed in a rectangular room where highly absorbent materials are installed on all sidewalls, and a test sample is mounted on the entire floor. In the quasi-one-dimensional sound field between the floor and ceiling, normal-incidence scattering coefficients of the sample are estimated by measuring the changes of reverberation times with and without the sample. In order to establish the proposed method, the measurement procedure and the test arrangement are examined in 1/4-scale experiments, and finally, it is validated in comparison with theoretical and numerical results.
Previously, methods for estimating the performance of noisy speech recognition based on a spectral distortion measure have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, no consideration is given to any change in the components of a speech recognition system. To solve this problem, we propose a novel method for estimating the performance of noisy speech recognition, a major feature of which is the ability to accommodate the use of different noise reduction algorithms and recognition tasks by using two cepstral distances (CDs) and the square mean root perplexity (SMR-perplexity). First, we verified the effectiveness of the proposed distortion measure, i.e., the two CDs. The experimental results showed that the use of the proposed distortion measure achieves estimation accuracy equivalent to the use of the conventional distortion measures, the perceptual evaluation of speech quality (PESQ) and the signal-to-noise ratio (SNR) of noise-reduced speech, and has the advantage of being applicable to noise reduction algorithms that directly output the mel-frequency cepstral coefficient (MFCC) feature. We then evaluated the proposed method by performing a closed test and an open test (10-fold cross-validation test). The results confirmed that the proposed method gives better estimates without being dependent on the differences among the noise reduction algorithms or the recognition tasks.
This paper develops a numerical method to analyze the membrane vibration of a membranophone with nonuniform heads, in which the density and tension vary smoothly. A spectral method is applied to numerically analyze the wave equation governing a membrane with spatially varying areal density and tension. An Indian drum tabla, representing nonuniform density, and a typical drum tom tom with several tension rods, representing nonuniform tension, are analyzed using the proposed numerical method to determine their eigenfrequencies and eigenmode shapes.
The theory of categorical perception of speech sounds traditionally suggests that speech sound discrimination is conducted based on phonemic labeling, which is an abstract speech representation that listeners are hypothesized to have. However, recent research has found that the impact of labeling on perception of an English /ɹ/–/l/ contrast may depend on surrounding sound contexts: the effects of phonemic labeling may disappear when the speech sounds to be discriminated are presented in a sentence. The purpose of the present research is to investigate (1) the effects of the sound contexts on categorical perception of speech sounds, and (2) cross linguistic extensibility of such an effect. The experiments employed a Japanese voiced stop consonant continuum, i.e., /ba/–/da/, and tested discrimination of sounds on the continuum by native speakers of Japanese. Experiment 2 in particular investigated whether sounds on such a continuum are discriminated in accordance with the labeling when the sound in question is inserted into a sentence. Through experiments, the cross linguistic effects of surrounding sound contexts are found although there may be some exceptional cases. The research proposes reconsideration of the role of labeling mediation in speech perception.