This paper describes the interstimulus interval dependence of the loudness difference limen obtained by taking into account the presentation order effect of sound stimuli. A paired comparison experiment of pure-tone sounds was carried out at different interstimulus intervals, and the experimental data were analyzed, taking into account the presentation order effect. The following two characteristic effects were obtained. First, the difference limen changed depending on the interstimulus interval. A logarithmic relation was found between the difference limen and the interstimulus interval. The difference limen was about 0.6 to 1.6 dB at an interval of 0.5 to 64 s. Second, the order effect also changed depending on the interstimulus interval. The sound presented first was perceived as being louder than the sound presented second at an interval of 0.5 to 4 s, and the sound presented second was perceived as being louder than the sound presented first at an interval of 16 to 64 s, although the sounds were of the same sound pressure level. When the interval was 8 s, the sound presented first was perceived as being as loud as the sound presented second. Based on the above findings, we estimated the region where the difference in loudness could not be detected. The obtained results show that the region is not symmetrical for the upper and lower boundary levels.
This study analyzed coarticulation involved in continuous speech based on articulographic data obtained from three Japanese male speakers. Distribution of the articulation points of vowels and consonants revealed that speakers might compensate for morphological differences in the hard palate by adjusting the location of vowel articulation points. To evaluate the effects of the left and right phonemes on the target phoneme, three-phoneme sequences, consisting of five Japanese vowels (V) and ten apical and two palatal consonants (C), were extracted from read sentences and used in this analysis. A stepwise multiple regression method was used to analyze the phoneme sequences, in order to evaluate the “contributions” of the surrounding phonemes to the central one. The results showed that the horizontal component of the articulatory movement had a dominant function during articulation. The movement of the tongue tip was highly correlated to the tongue dorsum movement in the horizontal dimension, but was almost independent in the vertical dimension. This phenomenon suggested that coarticulation in VCV sequences can be considered as an independent consonantal gesture superimposed on a transitional portion between vowels. For apical-vowel combinations, the preceding consonant in CVC had a stronger effect on the vowel than the following one, but there was no dominance caused by the positions of the vowels in VCV sequences. For palatal-vowel combinations, the following phoneme showed a greater effect than the preceding phoneme in CVC sequences. In VCV sequences, the open and closed vowels showed different behaviors.
A peripheral auditory model that accounts for masking by the positive and negative Schroeder-phase complex maskers was proposed. In addition, the proposed model provides reasonable frequency selectivity whereas the previous model using the analytic gammachirp filter did not. The proposed model is a cascade of a fixed filter for outer/middle ear response, a compressive gammachirp filterbank for auditory filters, a half-wave rectifier, a direct current component adder, a leaky integrator, and a detection module. Masking data of a short signal by the Schroeder-phase complex masker and notched-noise masking data were collected from the same human listeners to measure phase characteristics and frequency selectivity of the auditory periphery. The peripheral auditory model was used to simulate masking of a short signal, and the power spectrum model of masking was used to simulate the notched-noise masking data. Parameter fittings of both models were conducted simultaneously using the same parameter values of the compressive gammachirp filter for the individual and mean data. The results of model parameter fitting for both models provide a good explanation for human masking data.
The relationships between the various shapes in a Buddhist temple bell and the corresponding acoustic characteristics are clarified mainly by Finite Element Method (FEM) analysis. First, we show that the cross-sectional shapes of the “komazume” (lower part, which is slightly thicker than the rest of the bell) have high correlations to the vibration modes as well as to the vibration positions. As a result, the komazume has a large influence on the bell’s acoustic characteristics. The beat sound is another important factor in the bell’s overall sound, but its origin has not yet been fully clarified. Since the beat sound is assumed to be generated by the bell’s formal or material asymmetrical factors, the influences of a few formal asymmetrical factors on the beat sound are investigated. First, the beat sound obtained by simulation using miniaturized bells is confirmed to closely match the experimental results. Then, it is shown that the “doza” (part where the bell is struck), which functions as a formal asymmetrical factor when the bell vibrates, is highly related to the beat sound. It is also clarified that the “obi” (perpendicular stripe pattern on the bell’s surface) also slightly influences the beat sound.
We propose a Vietnamese Text-To-Speech (VieTTS) system which is a parametric and rule-based speech synthesis system. Fundamental speech units of this system are demisyllables with Level tone. VieTTS uses a source-filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. We chose the Hanoi dialect for VieTTS. Tone synthesis of Vietnamese is implemented by using fundamental frequency (F0) patterns and power pattern control. F0 is the most important factor in Vietnamese tone synthesis and the power control strongly affects Broken and Drop tones. Applying power control for tone synthesis is unique for Vietnamese compared to other tonal languages such as Chinese and Thai. This new result is confirmed by listening tests with a reasonable listening correct rate.
Pressure sensitivities of laboratory standard microphones are acoustic standards required for precise measurement in the audible frequency range. The National Metrology Institute of Japan (NMIJ) has developed a pressure calibration system that enabled absolute calibration of pressure sensitivities. In this study, we analyzed the measurement uncertainty of pressure sensitivities calibrated at NMIJ in detail by using the pressure sensitivity formula and examining every possible component that contributed to the uncertainty. The analysis revealed several dominant components, such as repeatability in the measurement of the voltage transfer function, cavity volume of the coupler, capillary tube correction factor, and instability of the microphone. Within a frequency range in which the expanded uncertainty was constant, the expanded uncertainty (coverage factor k=2) was 0.04 dB for Brüel & Kjær 4160, a type LS1P microphone, and 0.09 dB for Brüel & Kjær 4180, a type LS2aP microphone.
The focusing property of time reversal waves (phase conjugate waves) in shallow water or the deep sea has been discussed based on simulations of the normal mode method. In this study, the focusing property of time reversal waves was verified through tank experiments and it was concluded that time reversal waves enable acoustic waves to be converged to the focus. Further, the simulation results qualitatively concurred with the experimental results. This study also investigated how the configuration of the elements of the Time Reversal Mirror (TRM) array affects the focusing property, and the following matters were clarified: The signal received at the focus is not disrupted when the number of elements in the TRM array is reduced. Even if the TRM array is tilted or the heights of its elements are steadily shifted, the signal received at the focus is not disrupted. However, if the tilting or the height-shifting is not stationary, the time reversal waves lose their focusing property. Even a horizontal TRM array can generate time reversal waves if it is of adequate length.
In order to investigate the effect of room acoustic conditions on music players on the stage, the authors have been conducting experimental studies of solo performances, using a newly developed sound field simulation method with a 6-channel recording/reproduction technique. By expanding this system, another simulation system to duplicate the situation where two musicians play in ensemble has been newly developed using two anechoic rooms and a 24 channel digital convolution system. Using this system, examinations were conducted to check the applicability of this sound field simulation technique to the experimental investigation of stage acoustics for ensemble performance between two players.