Three sets of new findings with regard to modulation of visual perception by auditory stimuli are reviewed. First, we show that visual temporal resolution can be either improved or deteriorated by accompanying sounds, depending on the sequence and delay among the auditory and visual stimuli. Second, a single visual flash can be perceived as multiple flashes when accompanied by multiple sounds. Third, an ambiguous motion display consisting of two objects moving toward each other is perceived as streaming with or without an unsynchronized sound, but as bouncing with a synchronized sound. Based on these findings, we argue, against the traditional belief of visual dominance, that audition can modify vision particularly when it provides strong transient signal(s).
It is well known that there is a point-to-point map of auditory space in the midbrain: each neuron is tuned to a particular sound-source location, and neurons’ preferred locations are topographically represented in a neural structure. In the auditory cortex, however, researchers have consistently failed to demonstrate evidence for such an auditory space map, despite the well-known necessity of the auditory cortex for normal sound localization. Cortical neurons show generally broad spatial tuning, and the preferred locations are not systematically organized on the cortex in a topographical fashion. An alternative hypothesis is presented here: Individual single neurons represent auditory space panoramically by space-specific characteristics of their spike patterns. Information about any particular sound-source location is distributed across a large population of neurons, and we predict accurate localization judgement by combining information across those neurons. In our analyses of experimental data using an artificial neural network algorithm, we were able to recognize spike patterns of single neurons to identify sound-source locations throughout 360° of space. The amount of information carried by a moderate size of neural ensemble appeared sufficient to account for the accuracy of location judgements by behaving animals.
Physiological and psychophysical evidence for temporal coding of sensory qualities in different modalities is considered. A space of pulse codes is outlined that includes 1) channel-codes (across-neural activation patterns), 2) temporal pattern codes (spike patterns), and 3) spike latency codes (relative spike timings). Temporal codes are codes in which spike timings (rather than spike counts) are critical to informational function. Stimulus-dependent temporal patterning of neural responses can arise extrinsically or intrinsically: through stimulus-driven temporal correlations (phase-locking), response latencies, or characteristic timecourses of activation. Phase-locking is abundant in audition, mechanoception, electroception, proprioception, and vision. In phase-locked systems, temporal differences between sensory surfaces can subserve representations of location, motion, and spatial form that can be analyzed via temporal cross-correlation operations. To phase-locking limits, patterns of all-order interspike intervals that are produced reflect stimulus autocorrelation functions that can subserve representations of form. Stimulus-dependent intrinsic temporal response structure is found in all sensory systems. Characteristic temporal patterns that may encode stimulus qualities can be found in the chemical senses, the cutaneous senses, and some aspects of vision. In some modalities (audition, gustation, color vision, mechanoception, nocioception), particular temporal patterns of electrical stimulation elicit specific sensory qualities.
The auditory system consists of the ascending and descending (corticofugal) systems. One of the major functions of the corticofugal system is the adjustment and improvement of auditory signal processing in the subcortical auditory nuclei, i.e., the adjustment and improvement of the input of cortical neurons. The corticofugal system evokes a small, short-term reorganization (plasticity) of the inferior colliculus, medial geniculate body and auditory cortex for acoustic signals repetitively delivered to an animal. When these signals become behaviorally relevant to the animal through conditioning (associative learning), the short-term reorganization is augmented and changes into a long-term reorganization of the auditory cortex. Animals acquire the behavioral relevance of sounds through associative learning. Human babies also acquire language through associative learning. Therefore, the corticofugal system is expected to play a particularly important role in processing behaviorally relevant sounds and in reorganizing the auditory cortex according to the behavioral relevance of sounds. Since the ascending and descending systems form multiple feedback loops, the neural mechanisms for auditory information processing cannot be adequately understood without the exploration of the interaction between the ascending and descending systems.
The discovery of hair cell regeneration in birds a little over a decade ago raises a number of obvious and exciting questions about basic functional and neural plasticity in the vertebrate auditory system. Because many birds must learn the complex, species-specific, acoustic signals they use for communication just as humans must learn the sounds of speech, the finding of hair cell regeneration in birds also raises other interesting questions. One of these questions concerns the relation between hearing loss and vocal production. Another question concerns the effect of full or partial hearing recovery on vocal behavior. The purpose of this paper is to review what is known about the functional (i.e. behavioral) consequences of hair cell loss and subsequent hair cell regeneration in birds, to point out the relevance of this work for human hearing recovery, and to suggest some directions for future research.
Our goal is to develop sound synthesis technology that users can synthesize arbitrary sound timbre, including musical instrument sounds, natural sounds, and their interpolation/extrapolation on demand. For this purpose, we investigated sound interpolation based on physical modeling. A sound-synthesis model composed of an exciter, a one-dimensional vibrator, and a two-dimensional resonator is used, and smooth timbre conversion by parameter control is examined. Piano and guitar sounds are simulated using this model, and interpolation between piano and guitar tones is investigated. The strategy for parameter control is proposed, and subjective tests were performed to evaluate the algorithm. A multidimensional scaling (MDS) technique is used, and perceptual characteristics are discussed. One of the axes of the timbre space is interpreted as spectral energy distribution, so the spectral centroid is used as a reference to adjust parameters for synthesis. By considering the centroids, smoothly interpolating timbre is achieved. These results suggest the possibility of developing a morphing system using a physical model.
Recently, harmonic imaging has been widely used for improving image quality in the field of ultrasonic diagnostic. In this research, the evaluation of the quality improvement of the ultrasonic B-mode images by tissue harmonic imaging was attempted by combining this technique with the numerical analysis method. The influence of the inhomogeneous intervening medium on the ultrasonic B-mode images was also considered. Also, the region of interest (ROI) on the B-mode images were compared with the surrounding region to quantify the contrast improvement based on the relative echo level. The image quality improvement of tissue harmonic imaging was also analyzed in consideration of the signal-to-noise ratio (SNR).
A sound rendering system comprising a loudspeaker in front of a listener, a fully open-air headphone-set, and adaptive filters is described. This system enables cancellation of the sound from the loudspeaker at one ear of the listener, as well as generation of a delayed and attenuated version of the loudspeaker sound. The delay and attenuation are adjusted to control the sound image direction. Unlike conventional systems, the adjustment is accomplished irrespective of listener’s position.The performance was evaluated in terms of the estimation error and the perception of the sound image. The estimation error was simulated on the assumption of an impulsive head movement. The estimation error was insignificant for minor or slow movements. The sound image direction and distance were psycho-acoustically investigated. The sound image direction was controllable from the left to the right. There was no significant difference between the distance with the proposed system and that with an actual source. These results indicate that the proposed system enables individual localization control over a frontal semicircle.
Acoustic emission from a single bubble is observed using a needle-type hydrophone and compared with Mie-scattering data. The acoustic emission is composed of two signals, i.e., pulses occuring at the moment of collapse and background emission ΔV. The average of ΔV and the maximum radius can be grouped to two types of collpase. The average power of ΔV is proportinal to or increase more rapidly than R3max. It is suggested that the background emision is associated with multipolar source of acoustic waves in which vortices surrounding the bubble play an important role.
This study presents an alternative control system in which the acoustic impedance of the diaphragm of an electro-acoustic transducer can be manipulated by modifying the design parameters of the control system. This system involves a state-space description of an electro-acoustic transducer that is derived from its electrical equivalent circuit using modern control theory. The optimal quadratic regulator was used in the control system design, and the quadratic performance index was formulated to relate to the square of the sound pressure near the diaphragm of the control system. Computer simulations were performed to test the proposed control system and indicated that significant reductions in the acoustic impedance density could be achieved near the assumed vibration frequency that was used in the formulation of the quadratic performance index. A computer model of the proposed control system was used to illustrate effective active noise control in a duct and indicated that the control system brings about an effect similar to that of a resonator type muffler.
This paper introduces a motion tracking system useful for monitoring articulatory movements. The system is made up in combination of two sensor units, magnetometer and optical sensors. The magnetometer unit consists of sensors having two amorphous alloy cores and small permanent magnetic rods glued on the tongue surface. The measuring principle is based on a change in the intensity of the magnetic field related to the distances between the rods and sensors. The optical sensor unit consists of a position-sensitive device (PSD) and light-emitting diodes (LEDs) which are attached to several selected points of articulators. Simultaneous measurements have been done in combination with the magnetic sensing unit and the optical one. Two points on the tongue surface were measured by making use of the magnetic sensing unit, and five points; two points for the jaw, two points for the lips, and one point for the nose, were measured by use of the optical sensing unit.
We have developed a method of segregating desired speech from concurrent sounds received by two microphones. In this method, which we call SAFIA, signals received by two microphones are analyzed by discrete Fourier transformation. For each frequency component, differences in the amplitude and phase between channels are calculated. These differences are used to select frequency components of the signal that come from the desired direction and to reconstruct these components as the desired source signal. To clarify the effect of frequency resolution on the proposed method, we conducted three experiments. First, we analyzed the relationship between frequency resolition and the power spectrum’s cumulative distribution. We found that the speech-signal power was concentrated on specific frequency components when the frequency resolution was about 10-Hz. Second, we determined whether a given frequency resolution decreased the overlap between the frequency components of two speech signals. A 10-Hz frequency resolution minimized the overlap. Third, we analyzed the relationship between sound quality and frequency resolution through subjective tests. The best frequency resolution in terms of sound quality corresponded to the frequency resolutions that concentrated the speech signal power on specific frequency components and that minimized the degree of overlap. Finally, we demonstrated that this method improved the signal-to-noise ratio by over 18dB.