The purpose of this study is to derive objective measures that can accurately represent the characteristics of a sound field regarding the tonal balance corresponding to our sense of hearing. Two types of listening test were conducted in the form of paired comparison, in which subjects were tested using sound fields produced by convoluting anechoic music sources with impulse responses. In the first listening tests, impulse responses were calculated theoretically for a simple structure of sound field consisting of a direct sound and reflections, and in the second tests, impulse responses were measured at various seats of concert halls. In the latter case, impulse responses that give almost the same distinctness were used in the listening tests. From this investigation, it is found that one objective measure named deviation of level (DL) is a possible effective measure for evaluating the tonal balance of sound fields. The index DL is calculated from data based on the logarithmic scale in both the frequency and magnitude.
When an object that is buried underwater in a sediment layer is detected from the diagonal, detection is often hampered by reflection from the bottom surface. Stated in different terms, the sound-wave energy that reaches the object is small because of reflection and refraction that occurs at the bottom surface when a sound wave is applied to the object in the sediment for diagonal detection. To alleviate that problem, a pseudo sound source can be set up in the sediment layer. The sound wave radiated from the pseudo sound source is then received by a time-reversal array. The sound wave can be injected into the sediment layer by reversing the received signal with time, and radiating it again from the time-reversal array. The beam can hit the target through movement of this pseudo sound source along the bottom surface. The sound wave that hits the target is reflected and returns in water. The sound wave is subsequently reflected many times at the surface and bottom, and is diffused over time. The passive-phase conjugate processing is then given to the received diffusing signal to reduce this diffusion. The pulse is compressed by this processing, thereby obtaining the target position.
An acoustic method that can be used in air has the potential to allow for a fast and accurate characterization of objects in air. Nevertheless, it is difficult to identify acoustic signals from small objects clearly because of environmental noise and the scattering of sound on the object surface. Therefore, a sensing system that enables the measurement of small objects in air must be developed. In this study, we performed the localization of small objects of size comparable to the sound wavelength using an M-sequence signal and the phase information of received signals in a noisy indoor environment. Using the M-sequence signal, we are able to improve the SN ratio and to measure in a stable manner the reflected waves that cannot be detected using a conventional impulse. The arrival direction information was used to extract signals reflected by targets from unwanted signals of the floor or ceiling. Using an M-sequence signal and the arrival direction information, the position detection of small objects in the indoor environment was enabled.
Acoustic impedance determines the boundary condition of each sound field, but collections of actual values to evaluate sound fields are insufficient. Therefore, measurements of acoustic impedance using a particle velocity sensor were taken on different fields. Such measurement results were used for sound propagation calculations. Frequency characteristics of sound propagation on grass, snow-covered, and porous drainage pavement surfaces showed fair correspondence with field measurement results. Subsequently, fine calculations in the frequency domain were converted to impulse responses for each sound field model. Convolution operations based on the impulse response and on voice, music, and other noise sources readily produced an ideal sound field for the audible sound file. Furthermore, simulations of noise from a car running through a paved drainage area, with noise reduction effects, were attempted as advanced applications.
This research is aimed at understanding how time-frequency features are extracted from frequency-modulated (FM) sounds in auditory cortical neurons. We investigated spatiotemporal response patterns in the primary auditory cortex in the guinea pig, using an optical recording method (MiCAM Ultima; Brain Vision) with a voltage-sensitive dye (RH795). Experiments were performed under anesthesia (ketamine, 100 mg/kg and xylazine, 25 mg/kg). A pure tone (PT) evoked a strong on-response that was followed by an inhibitory phase lasting approximately for 100 ms. An exponential FM sound evoked an additional activity moving along the frequency axis. Such an FM response became more evident when the frequency of the FM sound was modulated in a wider range. The position of the peak amplitude of the later FM response was shifted in accordance with the direction of the FM sweep, but the distance of the shift tended to be in the range of around 1 mm. Accordingly, the FM response sometimes appeared in isofrequency areas outside those covering the frequency range of the FM sound. These results indicate that interactions between excitatory and inhibitory neurons provide a robust FM detector that utilizes maximally the frequency-response area (FRA) of neurons distributed along the frequency axis in the auditory cortex.
The Satsuma biwa and the cello are compared from the viewpoint of their wood properties. According to the wood classification diagram, the mulberry traditionally used for the biwa is very far from the Western criteria for the resonance woods such as sitka spruce and maple respectively used for the top and back plate of the cello. The structural responses of these instruments are investigated by measuring the driving-point mobility and the transmission mobility of the top plate. The cello is designed to stress the fundamental, while the biwa is constructed to sustain the higher harmonics that are generated by the “sawari” mechanisms applied to the nut and frets. Since the sawari tone yields a reverberating high-frequency emphasis, it is auditorily discriminated from the lower harmonics, which depend on the mode vibrations of the top plate and the bridge. In addition, the camphor-made biwa is compared with the mulberry-made biwa on their structural responses and the resulting sound spectrograms. The camphor wood is not an excellent substitute for the mulberry. Furthermore, the acoustical features of other Asian stringed instruments, where the paulownia and amboyna wood are used, are briefly discussed in relation to the playing style and musical taste.
There are many music systems available on the market, such as systems for the automatic arrangement of music pieces given as note sequences for solo pianos into a piano score in a specific style. These systems, however, are usually designed to generate music by concatenation of existing arrangement patterns, so no one can expect that these systems will satisfy user requirements. We propose a system in which a given melody expressed as a note sequence is arranged into a modern Jazz-style score for the piano on the basis of the “Jazz theory,” a theory of harmony used in Jazz and popular music. The performance of the proposed system is evaluated by comparing the results obtained with the proposed system with those obtained using popular arrangement systems available on the market. Experimental results show that arrangement using the proposed system is significantly superior to arrangement using systems available on the market.
Introduced is a database used for research consisting of songs sung in traditional Japanese and western styles to clarify the acoustical differences between them. Seventy-eight top-class singers covering 31 music genres are recorded in this database including 18 “Living National Treasures” singing in traditional Japanese styles. The database includes the five vowels of Japanese uttered naturally and sung by the singers in the style of their genre. Recordings were made in anechoic chambers using a digital tape recorder. The database consists of 18 CDs and an explanatory book. Shown in this paper are examples of comparative studies on the acoustical features of vibrato in traditional Japanese singing and bel canto, together with studies on formant shifts from natural utterance to singing.
In this paper, we introduce a new method of robust speech recognition under noisy conditions based on discrete-mixture hidden Markov models (DMHMMs). DMHMMs were originally proposed to reduce calculation costs in the decoding process. Recently, we have applied DMHMMs to noisy speech recognition, and found that they were effective for modeling noisy speech. Towards the further improvement of noise-robust speech recognition, we propose a novel normalization method for DMHMMs based on histogram equalization (HEQ). The HEQ method can compensate the nonlinear effects of additive noise. It is generally used for the feature space normalization of continuous-mixture HMM (CMHMM) systems. In this paper, we propose both model space and feature space normalization of DMHMMs by using HEQ. In the model space normalization, codebooks of DMHMMs are modified by the transform function derived from the HEQ method. The proposed method was compared using both conventional CMHMMs and DMHMMs. The results showed that the model space normalization of DMHMMs by multiple transform functions was effective for noise-robust speech recognition.
In this paper, we investigated the effect of a competing noise source on the intelligibility of target speech. We are currently investigating the possibility of realizing a network-based conference system in which individual users participating from stand-alone PCs share a common virtual space. We are currently focusing on the acoustic aspects of this conferencing system. The intelligibility improvement of the primary speech when the competing sources are localized away from the listener on the horizontal plane was investigated. The primary speaker was placed directly in front of the listener, and a single competing source was placed on the horizontal plane at various azimuths and distances. DRT intelligibility tests showed that intelligibility scores of over 70% are achieved when the competing source is placed at azimuths of more than 45° away from the primary speech source. Increasing the distance of the competing source further enhances the intelligibility scores, as expected. These results show the feasibility of a multiparty audio conferencing system with a carefully controlled speaker location for intelligible speech.
The attenuation coefficient and propagation speed of an airborne ultrasound wave are measured for highly porous open-cell polyurethane foams and fibers at frequencies from 1 kHz to 1.7 MHz. A theoretical model is proposed to explain physically the frequency responses of the insertion loss and the speed of an air-coupled wave in porous materials. The model is derived from Biot’s flow resistance and density, Lambert’s bulk modulus for fluids in pores, and Zwikker and Kosten’s concept for the compliance of the side holes with entrance resistance. Using measured data of static flow resistance to determine the mean pore size and the proposed model, theoretical prediction is performed for the transmission losses and sound speeds. Good agreement between theory and experiment over the entire frequency range confirms the usefulness of the present model. In addition, the model provides findings for Nagy’s extra attenuation coefficient for a slow wave measured in cemented glass bead specimens and in sandstone for high-frequency ranges.
Teachers belong to the group of professional voice users who often suffer from voice disorders. A reduction of the voice capacity can impede or stop the exertion of their profession. One reason for a significantly increased prevalence of voice problems can be poor room acoustical conditions in the class rooms. About the half of the teachers of a secondary modern school in Aachen were investigated with respect to their voice status by using phoniatric, logopedic and objective voice analysis methods. The prevalence of voice problems in this group was found to exceed previous studies where subjective voice quality was rated. Four rather reverberant and loud class rooms in that school were analysed using measurements of the reverberation time, T30, and the speech transmission index, STI. In a further part of this joint project the change of voice quality during the teachers’ working day was analysed. Two of the four rooms were acoustically optimised. Members of two groups of teachers with and without voice problems were recorded before and after teaching in either one of the acoustically poor rooms or one of the newly renovated rooms. The results indicate statistically significant differences between the groups of subjects with respect to one or more voice parameters. Healthy subjects are less affected by unfavourable room acoustical conditions than subjects with voice problems.