Neural networks have proven valuable for estimating the Direction-of-Arrival (DoA) of acoustic signals since they are capable of overcoming the accuracy and robustness inherent to conventional estimation methods when dealing with acoustic phenomena. This paper presents a system based on the processing of the acoustic intensity formulated in spherical coordinates. Due to the omnidirectional and spherical structure of this type of features, a spherical convolutional neural network architecture is used to estimate the DoA by means of a regression task. A series of experimental tests based on angular error have been performed to examine the accuracy as a function of reverberation and noise, demonstrating the degree to which the proposed method offers competent robustness compared to some state-of-the-art methods.
Since convolution operators characterize linear time-invariant (LTI) systems, convolution is one of the most fundamental operations in acoustic signal processing. Likewise, the analysis and processing of acoustic signals in the time-frequency domain is commonplace. In this context, LTI systems are often approximated with simple time-frequency domain operations, such as channel-wise multiplicative weights or channel-wise convolution. On the other hand, accurate computation of LTI systems in the time-frequency domain usually requires using crossband filters. However, the algorithms for connecting the LTI systems and the corresponding crossband filters have not been well-established, which has hindered their practical applications. In this paper, we introduce and compare several algorithms for computing the crossband filter representation of a given LTI system. Furthermore, we propose an inverse scheme for converting crossband filters into LTI systems, which is proven to recover the correct LTI system from its crossband filter representation. Efficient algorithms for this conversion are presented. We demonstrate numerically that the conversion of crossband filters into LTI systems is faithful: If the crossband filter represents an LTI system, then the conversion recovers the corresponding impulse response, confirming our theoretical result. Finally, we compare the computation time of forward and inverse conversion for all the presented algorithms.
Modelling traffic noise propagation behind buildings has been a challenging issue for environmental noise assessment. As a solution, the ASJ RTN-model provides a straight-forward and powerful calculation model of road traffic noise, which includes an effective calculation method for predicting the noise propagation behind buildings with complex conditions. However, the model is limited to fixed frequency characteristics of the noise source. To extend the current model to a frequency-dependent prediction model, in this work, we report an attempt at constructing a prediction model accounting for different frequency bands on the basis of scale model experiments. Furthermore, we establish a parameter synthesis method that enables the prediction model to be applied to the various frequency characteristics of the noise source.
Advancements in determined blind source separation (BSS) have been achieved through two approaches: design of better source models and derivation of better optimization algorithms. This paper proposes novel BSS algorithms based on alternating direction method of multipliers (ADMM) to easily incorporate additional constraints and regularization terms into a source model, which provides more flexibility for the source model design. In addition, the structure of spatially whitened signals is effectively utilized to simplify the computation and speed up the ADMM algorithm. In experiments, we applied the proposed ADMM algorithm to the source model of independent vector analysis and compared it with the majorization-minimization (MM) algorithms. Experimental results showed that the proposed ADMM algorithm with the speeding up technique can achieve performance and convergence speed comparable to the state-of-the-art MM algorithm. Our MATLAB codes are available at https://github.com/WATARAI-Hiroko/ADMM-IVA.
This article reports on measurements of the near-field head-related transfer functions (HRTFs) of a head and torso simulator (HATS) conducted at NHK Science and Technology Research Laboratories. The measurements included 27 source distances ranging from 0.20 m to 1.50 m at 0.05 m intervals, and 865 directions at each distance. It is desirable to use a sufficiently small sound source that has a wide frequency range such that it approximates a point source with omnidirectional characteristics for the measurements of near-field HRTFs. This article introduces the measurement system constructed in an acoustic anechoic chamber at NHK Science and Technology Research Laboratories and the compact loudspeaker manufactured to meet these requirements. The measurements reveal that, in addition to the overall decrease in the HRTF magnitudes with distance, a sharp attenuation of the low-frequency magnitude could be seen on the ipsilateral side of the sound source, and peaks and notches observed in the spectra varied in their depth and frequencies. These tendencies are consistent with previous reports on the near-field HRTFs.
Several attempts have been made to quantify the diffuseness of a sound field from various perspectives. In this study, we focus on the two diffusion indices determined from incidence directivity: directional diffusion coefficient and isotropy indicator. We revise their definitions in the light of physical comprehensibility and also introduce a normalization procedure. These revisions bring an explicit relationship between the indices, which provides their consistent interpretation. Moreover, we conduct numerical examinations using an existing incidence directivity analysis and determine the behaviors of the revised indices in a room with different absorption conditions.
Emotional responses to forest sounds among young Japanese women were examined by comparing birdsong with sounds produced by a potentially dangerous insect. The birdsong of four distinct species was significantly more pleasant and less arousing than the sound of an Asian giant hornet. Skin blood flow, an indicator of sympathetic nervous activity, decreased during exposure to the hornet sound, whereas no significant change was observed to birdsong. These findings suggest that young women show similarly positive subjective responses to the acoustically distinct songs of forest bird species and exhibit an adaptive physiological response to the hornet sound.
This study investigates changes in the rhythmic characteristics of Japanese speech induced by delayed auditory feedback (DAF). Ten native speakers read sentences under four DAF conditions (0, 100, 200, and 300 ms), and interval durations (consonant, vowel, and mora) were measured. Speech rhythm indices were evaluated using Mean (average duration), Delta (absolute variability), and Varco (relative variability). DAF increased Mean and Delta in consonant and vowel durations, indicating greater temporal variability at the phoneme level. However, the relative isochrony of morae, as quantified by the Varco index for mora duration (VarcoM), remained stable, suggesting mora timing tends to be preserved under feedback perturbations.