Signal processing methods that accurately synthesize sound pressure at the ears are important in the development of spatial audio devices for personal use. This paper reviews the current methods and focuses on a promising class of these methods that rely on combining the spatial information available in microphone array recordings and datasets of head-related transfer functions (HRTFs). These two kinds of spatial information enable the consideration of dynamic and individual auditory localization cues during binaural synthesis. A general formulation for such a class of methods is presented in terms of a linear system of equations, whose associated matrix is composed of acoustic transfer functions that relate the positions of microphones and HRTFs. Based on this formulation, it is shown that most of the existing methods under consideration can be classified into two prominent approaches: 1) the HRTF modeling approach and 2) the microphone signal modeling approach. An important relation between these two approaches is evidenced in the general formulation: when one approach arises from the solution to an overdetermined system, the other corresponds to an underdetermined system, and vice versa. Illustrative examples of binaural synthesis from spherical arrays are provided by means of simulations. Underdetermined systems generally achieve better performance than overdetermined ones.
Although a previous study indicated that a microphone-array system consisting of seven microphones and a neural network realizes sharp sensitivity, it is only effective for a single frequency. In this work, we propose a new system with a modified input structure. Unlike the previous system, which was trained with the spatial patterns for a single frequency, the proposed system is trained with temporal-spatial patterns of the sound pressure distributions for sinusoidal signals at multiple frequencies. Three frequencies (425, 850, and 1,700 Hz) are used for the training process of a neural network in the proposed system. A computational simulation shows that the proposed system can realize sharp sensitivity with a half width of 5° at 425–1,700 Hz including untrained frequencies. Moreover, in an examination using an amplitude-modulated (AM) or frequency-modulated (FM) wave as the input signal, the proposed system achieves a higher performance than those in the previous study.
High-resolution ultrasonic pulse-echo measurement is discussed. For higher time resolution, a sensitivity-compensated (SC) signal and a method of band expansion using linear prediction (LP) processing have been proposed. Moreover, two-dimensional direction measurement calculated from the difference in distances derived using the SC signal has been studied. Because a small error in distances will result in a large error in the direction result, higher accuracy of distance measurement is required. In this paper, the efficiency of band expansion using LP processing with an autoregressive model is discussed for two-dimensional direction measurement. The comparison results show that by using the band expansion method, the compressed pulse width is shortened to about 18% of its original value and the accuracy of direction measurement is improved using the higher-time-resolution signal.
The measurement of distributed force is highly demanded in various fields such as robotics. A flexible and nonmetal sensor configuration is often required in these applications. In this report, a distributed sensing method using an elastic tube, a small loud speaker and a microphone has been proposed, where the position of deformation along the tube can be determined from the acoustic characteristics of the tube. The basic properties of the distributed sensor were studied for a 1.2-m-long resin elastic tube of 10 mm inner diameter. The distribution of deformation was estimated from the power spectrum measured at one end of the tube with white-noise excitation. The position of deformation was successfully detected from the Fourier transform of the power spectrum. It was also proved that two separate deformed parts were able to be discriminated.
We present a method for estimating the propagation delay of a direct or reflected wave in a large concrete caisson. When striking a concrete structure with a small impulse hammer, reflection waves are generated from the concrete structure boundary and from a crack if a crack exists. Previously, we assumed that a direct or reflected wave can be represented approximately by a modeled vibration wave. We proposed a convolution between the hammer force and the impulse response expressed by an exponentially decaying sinusoidal wave. Then we used the finite-difference time domain (FDTD) method to investigate details of the vibration wave. The results show that the estimation error in detecting a small reflection wave is not negligible. Therefore, the sensor output should be modeled as the sum of a significant modeled vibration wave and two smaller modeled vibration waves. The propagation delay of the estimated significant modeled vibration wave is equal to the value calculated theoretically from the shape. As described herein, the propagation delays of a direct or reflection wave were estimated for two real caissons situated on land and in the sea. The results show that the estimated propagation delays coincide well with those expected from the concrete structure shape.
The jaw is one of the most important articulators in speech production. Despite this, we know next to nothing about how Japanese speakers use their jaws to produce vowels. Against this background, in order to explore the articulatory nature of Japanese vocalic jaw movements, this paper presents a detailed, quantitative EMA study of the five vowels in Japanese, focusing on the following four specific questions: (1) How many mms does the jaw open for each type of vowel in Japanese?; (2) Does the presence of an onset consonant affect the degrees of jaw opening?; (3) Does the speed of the jaw movement vary depending on how much the jaw opens?; (4) What is the reliable acoustic correlate of the jaw opening? In answer to these questions, the current experiment demonstrates that (1) In Japanese, the degree of the jaw opening is in the order of [a] > [e] > [o] > [i] > [u]; (2) The presence of onset consonant [p] generally decreases the jaw opening; (3) The degrees of the jaw opening and its speed positively correlate; and (4) F1 and duration are reliable acoustic correlates of the jaw opening. Implications of these results for phonetic theories are discussed throughout the paper.