Within the cochlea, broadband sounds like speech and music are filtered into a series of narrowband signals, each with a relatively slowly varying envelope (ENV) imposed on a rapidly oscillating carrier (the temporal fine structure, TFS). Information about ENV and TFS is conveyed in the timing and short-term rate of action potentials in the auditory nerve. This paper describes the role of ENV and TFS information in pitch perception, binaural processing, and the perception of speech in the presence of background sounds. The paper also describes the effects of hearing loss and age on the processing of TFS and ENV information. The monaural and binaural processing of TFS information is adversely affected by both hearing loss and increasing age. The monaural processing of ENV information is little affected by hearing loss or by increasing age. The binaural processing of ENV information deteriorates somewhat with increasing age but is not markedly affected by hearing loss. The reduced TFS processing abilities found for older/hearing-impaired subjects may partially account for the difficulties that such subjects experience in complex listening situations.
The speech-based envelope power spectrum model (sEPSM) was developed to predict the speech intelligibility of sounds produced by nonlinear speech enhancement algorithms such as spectral subtraction. It is a linear model with a linear, level-independent gammatone (GT) filterbank as the front-end. Therefore, it seems difficult to evaluate speech sounds with low and high sound pressure levels (SPLs) consistently because the intelligibility of the speech is dependent on the SPL as well as the signal-to-noise ratio. In this study, the sEPSM was extended with the dynamic compressive gammachirp (dcGC) auditory filterbank and a ``common'' normalization factor of the modulation power spectrum component to improve the predictability of the model. For evaluating the proposed model, we performed subjective experiments on the intelligibility of speech sounds enhanced by spectral subtraction and a Wiener filter algorithm. We compared the subjective speech intelligibility scores with the objective scores predicted by the proposed dcGC-sEPSM, original GT-sEPSM, and other well-known conventional methods such as the short-time objective intelligibility measure (STOI), coherence speech intelligibility index (CSII), and hearing aid speech perception index (HASPI). The result shows that the proposed dcGC-sEPSM predicted the subjective results better did than the other methods.
The directivity control of a finite cylindrical loudspeaker array can be applied to various systems such as personal audio and smart speaker systems. As a conventional directivity control method, the filter coefficients of a cylindrical array may be derived analytically on the basis of cylindrical harmonic expansion when desired directivity patterns are set as the control points in the cylindrical coordinate system. However, it is more convenient to set the control points in the spherical coordinate system when considering practical use. In this paper, we propose a filter design method to control the directivity patterns on a spherical surface for a finite cylindrical array. Although the proposed method is similar to the mode-matching method based on spherical harmonic expansion, it uses a combination of circular harmonics and longitudinal multipoles to express the directivity patterns and filter coefficients. To validate the proposed method, we evaluated the directivity patterns using a prototype 24-element cylindrical loudspeaker array in an anechoic chamber. Consequently, we validated that the three-dimensional directivity pattern formed was the same as that obtained by computer simulations.
Japanese has long and short vowel distinction. While duration is the primary cue for listeners, pitch is being used as the secondary cue when duration becomes ambiguous. Duration however is affected by phonetic environment and therefore pitch cues may be more important in daily conversations. At the same time, ageing is known to affect speech recognition, in particular, pitch contour discrimination, such as tones. The current study compared a group of 15 young listeners with 14 elderly listeners using the words `obasan (aunt)' and `obaasan (grandmother),' manipulated in six steps duration-wise and pitch-wise. We found elderly listeners to use pitch more than the young listeners at the duration extremes, suggesting a generational effect on acceptability in accents (or lack of). At the same time, we observed half the elderly listeners to be less sensitive to pitch when duration becomes unreliable, depending on their fundamental frequency difference limens. The more sensitive elderly listeners, who performed similarly to the younger participants, significantly differed in their perception results from the less sensitive elderly listeners. This suggests that pitch deficits are present in half of the near-normal hearing elderly group, contributing to their inability to use pitch cues as well as their younger counterpart.
In this work, we built ultrasonic disc-shaped transducer for targeted neuromodulation with the addition of a solid axicon lens based on a polydimethylsiloxane (PDMS) interface. We made a numerical and experimental characterization of its acoustic field. The motor cortex of CF-1 mice was stimulated, through the skin and skull into the intact brain, with low-intensity pulsed ultrasound. Evoked muscle responses in different body segments were clearly observed, including hindlimb, forelimb, and tail. Axicon lens affixed on the face of the transducer makes possible a targeted modulation of the motor cortex by pulsed ultrasound, inducing muscle contraction in a specific body segment. In this approach, the lateral and axial spatial resolution is comparable to spherical segment ultrasound transducers, but with a shorter focal length. Thus, ultrasound axicon looks attractive to investigate the functional contributions of fine-grained spatial structures in the brain.
A spatialization method for loudspeakers arranged in a ring layout (like the one specified in the ITU recommendation ITU-R BS.775) is presented. The proposed method can be described as an improvement to amplitude panning methods in which a frequency-dependent gain is applied to mitigate the effect of loudspeaker locations on the desired signal. By grouping side loudspeakers it is possible to display elevated sources, which is difficult to achieve otherwise in single layer loudspeaker arrays. Objective evaluations showed that the proposed method produces less spectral distortion than 2D-VBAP, but degrades to cross-talk cancellation. Experimental results suggest that, despite the differences regarding cross-talk cancellation, the proposed method yields accurate azimuth and elevation estimations of sound sources in anechoic and echoic conditions except when they are located below the ear level.