This paper reviews some of the recent issues and findings in the area of production and perception of expressive speech and the application to speech synthesis. Specifically, it discusses some of the current problems with data collection, labeling, techniques for analyzing voice quality and applying speech synthesis as an analysis tool. Directions for future work in order to improve synthesis of expressive speech are suggested along the lines of better modeling, labeling and voice quality analysis.
Low frequency sound propagation features and bottom sediment properties in shallow water were studied in the Shallow Water Acoustic Technology (SWAT) experiments conducted in the East China Sea. In these experiments, a hydraulic-type acoustic source was towed over a range of some 30 km at constant mid-water depth and transmitted low-frequency cw signals, which were received on a bottom-moored vertical line array. After suppressing a time-dependent factor of the received signals, the asymptotic Hankel transform was applied to the acoustic field provided by the resulting synthetic aperture horizontal array that was created at each receiver depth. The horizontal wavenumber spectra thus obtained had peaks corresponding to mode, but these peak positions were observed to be slightly different among the different receiver depths partially due to noise and range dependency. Thus, stochastic mode inversion was exploited by using all of the identified peak positions for estimation of the geoacoustic properties. The sound field simulated using the estimated properties was compared with the measured one for each receiver depth and an excellent agreement was confirmed not only at the frequency used for the inversion but also at the different frequency.
A method of canceling parametric-effect-induced error in airborne ultrasound Doppler velocimetry is presented. In airborne ultrasound Doppler velocimetry, the received ultrasound signal contains phase modulation due to the nonlinear parametric effect in addition to the direct Doppler effect resulting from the vibration velocity of the specimen. The authors investigate the dependence of the size and frequency of the vibrating surface on the parametric effect. By measuring the acoustic pressure generated by the target vibration, the parametric effect is taken into account to estimate the vibrating velocity. The results show that the measurement error becomes smaller than that without cancellation. For example, for a body of 100 mm diameter vibrating at a frequency of 1.5 kHz, the error reduction is over 50%.
In order to investigate the effect of room acoustic conditions on music players, subjective experiments on professional musicians using digital sound field simulation technique were conducted. In the simulation system, two anechoic rooms were acoustically connected and the situation where two musicians play in ensemble on a hall stage has been virtually realized. Using this system, subjective experiment for ease of chamber music performance on professional musicians was performed and the preferable condition for ensemble performance was investigated. From the experimental results, the relationships between physical characteristics and psychological judgments by the musicians were considered.
Solid models of the vocal tract with hypopharyngeal cavities were molded with a stereolithographic technique based on MRI data obtained from a male speaker during the production of Japanese vowels /a/ and /o/. A vowel synthesis experiment conducted with the models revealed a relatively good agreement in the second and third formants, as well as in anti-resonance at 4–5 kHz. The elimination of the models’ piriform fossa resulted in the disappearance of the anti-resonance and shifts of the adjacent formants. The modification of the laryngeal cavity into a uniform tube caused spectral changes in the frequency range of 1.5–7.0 kHz. These acoustic effects of hypopharyngeal cavities were dependent on vocal tract shapes.