Under certain conditions, sounds actually missing from a speech signal can be synthesized by the brain and clearly heard. This illusory phenomenon, known as the phonemic restoration effect, reveals the sophisticated capability of the brain underlying robust speech perception in noisy situations often encountered in daily life. In this article, basic aspects of the phonemic restoration effect are described with audio demonstrations.
When ‘an ascending frequency glide of 1,500 ms with a temporal gap of 100 ms in the middle’ and ‘a continuous descending frequency glide of 500 ms’ cross each other, the gap is typically perceived as if it were in the shorter glide, which is actually continuous. This is the original pattern showing the gap transfer illusion as discovered in 1992. Some demonstrations that are very close to the demonstrations we made then are presented in this paper. Some suggestions for our future research are also indicated with new demonstrations.
Recent works on perception of noise-vocoded speech sound (NVSS) have revealed that amplitude envelope information is very important for speech perception when spectral information is not sufficiently available. Basically, the fundamental frequency information is not available and formant peaks cannot not be identified in NVSS. However, we can even recognize accent and distinguish male voice from female voice in NVSS. More, melody can be created from lyrics once lyrics are intelligible. In the present study, findings from fMRI measurement are introduced to show neural activities in the central nervous system during listening to NVSS. The present data indicate that various sites in the brain, which are not ordinarily used for speech recognition, participate in making NVSS intelligible. Applications of the present work include an innovative speech processor and a training system for hearing impaired people.
When two or more tones are presented simultaneously, a listener can sometimes hear other tones that are not present. These other tones called ‘combination tones’ are thought to be induced by nonlinear activities in the inner ear. It is difficult to demonstrate this phenomenon because a listener cannot easily distinguish combination tones from primary tones. This paper introduces a unique method called the ‘sweep tone method’ by which combination tones can be perceptually distinguished from primary tones relatively easily. The importance of the non-linear characteristics of the intact auditory system is described.
Music performers usually play music with some intentions. They play musical notes elongated or shortened and loud or soft for their expressive performances. Furthermore, they move their bodies consciously or unconsciously to enhance expression. Singers often show specific emotions on their faces during singing before audiences. Visual information from performers plays an important role for listeners to understand the performers’ intentions. Our studies in this field and a pioneering work by Davidson are reviewed.
A virtual auditory display (VAD) is a system for generating spatialized sound to a listener. Commonly, VAD techniques are based on convolving head-related transfer functions (HRTFs) to a sound source. When HRTFs in a VAD are not fitted to a specific listener, the accuracy of localization is often low and produces large localization errors, typically appearing as frequent front-back confusion. However, the measurement of HRTFs for each listener for all sound-source directions requires a special measuring apparatus and a long measurement time with a listener’s physical load. The author has therefore proposed an individualization method of HRTFs called the Determination method of OptimuM Impulse-response by Sound Orientation (DOMISO). In this paper, DOMISO and its effects are introduced.
Many demonstrations for education in acoustics have been developed in Japan as well as outside the country. Since 1997, the Technical Committee on Education in Acoustics of the Acoustical Society of Japan has been investigating and discussing education in acoustics in Japan. In this review, some of the educational tools and demonstrations in acoustics are introduced. They are all designed to help us visualize and hear different phenomena and to understand abstract theories in a more intuitive way. The work that has been carried out includes some exciting demonstrations in acoustics by the high-school physics teachers’ “Stray Cats Group,” some visual and aural demonstrations for architectural acoustics, a technical course called “Technical Listening Training,” a WWW-based training system, and physical models of the human vocal tract.
STRAIGHT, a speech analysis, modification synthesis system, is an extension of the classical channel VOCODER that exploits the advantages of progress in information processing technologies and a new conceptualization of the role of repetitive structures in speech sounds. This review outlines historical backgrounds, architecture, underlying principles, and representative applications of STRAIGHT.
In this tutorial paper, the author introduces a full physical surround-sound system in a single equipment box, focusing on its background, novel technology, and its application. Working on the principles of phased-array antennas commonly used for electromagnetic waves, but adapted for the wide-bandwidth requirements of audio acoustics, Digital Delay Arrays (DDA) simultaneously produce multiple independently-steered and -focused beams of sound each potentially carrying different audio programme material. Utilising the available reflective surfaces (e.g. ceiling and walls) in nearly all domestic listening environments, these distinct beams may each be arranged to reach the listeners from different directions, thus producing surround-sound. The basic signal processing requirements as well as several refinements are described, along with a discussion of the major design parameters of practical uniform array antennas, with extensions to non-uniform and non-planar array structures.
The relationship between the viscosity boundary layer and the resonance frequency of the generated sound in a loop-tube-type thermoacoustic cooling system is investigated. The frequency of the sound has been observed for various loop-tube lengths, inner pressures and working fluids, and the influence of the viscosity boundary layer upon the resonance frequency is discussed. It was generally considered that the sound generated in the loop-tube was usually resonated with the tube length by 1 wavelength. Under certain conditions, however, the resonant wavelength is 2. This results from the influence of the viscosity boundary layer. It is found that the loop-tube determines the resonance frequency so that the thickness of the viscosity boundary layer is smaller than the stack channel radius. As a result, the resonant wavelength is 2 under certain conditions. The frequency is an important parameter for the thermoacoustic cooling system. From obtained results, one of the factors for selecting the frequency is found.
The reverberation time in a room with unevenly distributed sound absorbers, such as a room having an absorptive floor and/or ceiling, is often observed to be longer in the middle- and high-frequency ranges than the values obtained using the Sabine/Eyring formula. In the present study, this phenomenon was investigated through a scale-model experiment and three-dimensional wave-based numerical analysis. The reverberation time in a room having an absorptive floor and/or ceiling was verified to be longer in the middle- and high-frequency ranges, and the arrangement of absorbers was found to affect the frequency characteristic of the reverberation time. The increase in the reverberation time is caused by the slow decay of the axial and tangential modes in the horizontal direction. The reverberation time is longer in the high-frequency range (in which the wavelength is sufficiently shorter compared with the height of the ceiling) than in the low-frequency range, even when the frequency characteristics of the absorption coefficients of the absorbers are flat. As a means of improving such an uneven reverberation time in a room, both the placement of diffusers in the vertical direction and the use of inwardly inclined walls (in rooms with highly absorptive floors) have been found to be effective.
A coding algorithm for speech called harmonic vector excitation coding (HVXC) has been developed that encodes speech at very low bit rates (2.0–4.0 kbit/s). It breaks speech signals down into two types of segments: voiced segments, for which a parametric representation of harmonic spectral magnitudes of LPC residual signals is used; and unvoiced segments, for which the CELP coding algorithm is used. This combination provides near toll-quality speech at 4.0 kbit/s, and communication-quality speech at 2.0 kbit/s, thus outperforming FS1016 4.8-kbit/s CELP. This paper discusses the encoder and decoder algorithms for HVXC, including fast harmonic synthesis, time scale modification, and pitch-change decoding. Due to its high coding efficiency and new functionality, HVXC has been adopted as the ISO/IEC International Standard for MPEG-4 audio.