Non-negative matrix factorization (NMF) has been one of the most useful techniques for musical signal analysis in recent years. In particular, supervised NMF, in which a large number of instrumental samples are used for the analysis, is garnering much attention with respect to analytical accuracy and speed. The accuracy, however, deteriorates if the system does not have enough samples. Therefore, in principle, such methods require as many samples as possible in order for the analysis to be accurate. In this paper, we propose an analysis method that 1) does not require the collection of a large number of training samples, and 2) combines the NMF and probabilistic approaches. In this approach, it is assumed that each instrumental category has a model-invariant feature, called a probabilistic spectral envelope (PSE). As an extension of a spectral envelope, this feature represents the probabilities of spectral envelopes belonging to the instrumental category in a two-dimensional (frequency-amplitude) space. The analysis of an input musical signal is carried out using a supervised NMF framework, where the basis matrix contains the optimum spectra that have been generated from pretrained PSEs.
It is clear that applications such as virtual auditory displays can be achieved by synthesizing head-related transfer functions with high accuracy. However, in practice, their detailed spectral shapes over the entire frequency range are not likely to be essential to sound localization. For example, low-frequency sound has a tendency to diffract around the head and torso of the listener, which leads to low-frequency characteristics of head-related transfer functions (HRTFs) being largely independent of the position of the sound source. This may imply that HRTFs involve few localization cues in the low-frequency region. We have sought to clarify whether removing spectral cues from HRTFs affects horizontal localization [K. Watanabe et al., Acoust. Sci. & Tech., 32(3), 121–124 (2011)]. In this paper, the low-frequency characteristics of the HRTFs of both ears were flattened below a certain frequency, termed the ``boundary frequency,'' to investigate the influence of the low-frequency component of HRTFs. These flattening procedures were simultaneously applied to the HRTFs of both ears while the interaural level and time differences of the original HRTFs were retained. A localization test using such partially flattened HRTFs was carried out, and the results showed that the flattening of the HRTFs did not significantly affect sound localization at boundary frequencies of 0.5–2 kHz, except for the source direction of 60°. Those boundary frequencies differed depending on source direction. The low-frequency region below these boundary frequencies may be ignored, and some data reduction can be applied here without significant influence on sound localization.
Computational modeling of the speech organs is able to improve our understanding of human speech motor control. In order to investigate muscle activation in speech motor control, we have developed an automatic estimation method based on a 3D physiological articulatory model. In this method, the articulatory target was defined by the entire posture of the tongue and jaw in the midsagittal plane, which was reduced to a six-dimensional space by principal component analysis (PCA). In the PCA space, the distance between an articulatory target and the model was gradually minimized by automatically adjusting muscle activations. The adjustment of muscle activations was guided by a dynamic PCA workspace that was used to predict individual muscle functions in a given position. This dynamic PCA workspace was estimated on the basis of an interpolation of eight reference PCA workspaces. The proposed method was assessed by estimating muscle activations for five Japanese vowel postures that were extracted from magnetic resonance images. The results showed that the proposed method can generate muscle activation patterns that can control the model to realize given articulatory targets. In addition, the estimated muscle activation patterns were consistent with anatomical knowledge and previously reported measurement data.
To investigate the effect of wind turbine noise, a study project has been conducted in the three years from fiscal year 2010 under the sponsorship of the Ministry of the Environment, Japan. One of the key aims in this study was to examine the effects of low-frequency components contained in wind turbine noise, and a series of auditory experiments have been conducted using an experimental facility composed of two adjacent rooms, where low-frequency sounds down to infrasound frequency range could be produced. As the first experiment using this facility, human hearing thresholds for pure tones were examined in the frequency range from 10 Hz to 200 Hz with 97 participants in a wide age range from 20 to 60 years.