Non-native Japanese speakers often mispronounce Japanese singleton and geminate stops. Previous studies have pointed out that non-native speakers' mispronunciations are caused by an inadequate closure duration, which is the primary acoustic cue to distinguish singleton and geminate stops. However, the durations of preceding and following segments of the closure have not been fully investigated. In this study, the durations of the closure and the preceding and following consonant-vowel segments were analyzed to clarify the characteristics of Japanese singleton and geminate stops mispronounced by Korean and Taiwanese Mandarin speakers. The results revealed that the non-native speakers pronounced singleton stops with a longer closure and a shorter preceding consonant-vowel segment than the native Japanese speakers. In contrast, they pronounced geminate stops with a shorter closure and a longer following consonant-vowel segment than the native Japanese speakers. These results indicate that non-native speakers' mispronunciations of Japanese singleton and geminate stops are caused by both inadequate closure duration and anteroposterior segment durations. It is likely that the reason for the mispronunciation is the difference in the rhythmic unit between Japanese and the first languages of the non-native speakers.
Acoustic measurement of the vocal tract proved that experienced saxophonists tune their vocal tract during advanced performances to effectively influence the vibration frequency of the reed (Scavone et al., J. Acoust. Soc. Am., 123, 2391–2400 (2008); Chen et al., J. Acoust. Soc. Am., 129, 415–426 (2011)). To understand how the shape of the vocal tract is altered, the vocal tracts of experienced saxophonists were scanned in three dimensions with magnetic resonance imaging while they played the instrument using different pitches with normal and overtone techniques. The scanned images demonstrated that the tongue was located posteriorly in the vocal tract for low notes; however, it moved forward when the participants produced overtones. The input impedance was then calculated for the players' air columns, including both the supra- and sub-glottal tracts, using an acoustic tube model. When the tongue moved forward to produce overtones, both the frequency and amplitude of the second impedance peak increased, suggesting an effective acoustic influence of the vocal behavior on the vibrating reed. The first impedance peak was less variable, regardless of the significant change in the vocal-tract shape for different notes.
The magnitude spectrum is a popular mathematical tool for speech signal analysis. In this paper, we propose a new technique for improving the performance of the magnitude spectrum by utilizing the benefits of the group delay (GD) spectrum to estimate the characteristics of a vocal tract accurately. The traditional magnitude spectrum suffers from difficulties when estimating vocal tract characteristics, particularly for high-pitched speech owing to its low resolution and high spectral leakage. After phase domain analysis, it is observed that the GD spectrum has low spectral leakage and high resolution for its additive property. Thus, the magnitude spectrum modified with its GD spectrum, referred to as the modified spectrum, is found to significantly improve the estimation of formant frequency over traditional methods. The accuracy is tested on synthetic vowels for a wide range of fundamental frequencies up to the high-pitched female speaker range. The validity of the proposed method is also verified by inspecting the formant contour of an utterance from the Texas Instruments and Massachusetts Institute of Technology (TIMIT) database and standard F2–F1 plot of natural vowel speech spoken by male and female speakers. The result is compared with two state-of-the-art methods. Our proposed method performs better than both of these two methods.
We measured the input impedance characteristics, input voltage versus output sound pressure characteristics, harmonic distortion characteristics, frequency characteristics, and impulse response of a currently available miniature electrodynamic driver unit (Foster Electric, MT006B) when used as a loudspeaker with an open space load. The nominal input impedance of the driver unit was 16 Ω, and its resonance frequency f0 was 2.7 kHz. At f0, the range of the input voltage level over which the output sound pressure of the driver unit increased linearly was −10 dBV (reference 1 V) or less. Below f0, the frequency response of the driver unit decreased by 15dB/oct, while above f0 it did not drop significantly. When the input signal level to the driver was −12 dBV, the signal-to-noise ratio between the sound pressure level produced by the driver and the background noise level of the soundproof room was 0 dB at a frequency of 130 Hz for a distance 0.2 m, and at 230 Hz for a distance 1.0 m. These results indicate that the MT006B can be used as an earplug speaker for a fast head-related transfer functions measurement system via reciprocity.