When a complex tone contains many harmonics, its pitch is usually determined by harmonics in a restricted frequency region, called the “dominant region,” which for fundamental frequencies (F0s) ≥ 100 Hz corresponds to low, resolved harmonics. We estimated the dominant region for tones with low F0, by measuring thresholds, F0DLs, for detecting a change in F0 of a group of harmonics embedded within harmonics with fixed F0. The spectral position of the shifted group was systematically varied. Components were added in either cosine or random phase. For F0s of 35 and 50 Hz, the position of the dominant region depended strongly on the relative phases of the components. When the envelope had a low peak factor, with multiple peaks per period (random phase), the dominant region fell at low harmonic numbers (for F0=50 Hz), or was not well defined (for F0=35 Hz). When the envelope had a high peak factor, with one peak per period (cosine phase), the dominant region fell at high harmonic numbers, where harmonics were unresolved. Generally, performance was better for cosine than for random phase. The results indicate that harmonics in the dominant region are not always resolved.
Manipulating spectral structure often leads to degradation of speech quality, which is mainly due to insufficient smoothness of the modified spectra between frames, and ineffective spectral modification. This paper presents a new spectral modification method to improve the quality of modified speech. If frames are processed independently, discontinuous features may be generated. Therefore, a speech analysis technique called temporal decomposition (TD), which decomposes speech into event targets and event functions, is used to model the spectral evolution effectively. Instead of modifying the speech spectra frame by frame, we only need to modify event targets and event functions. This feature leads to easy modification of the speech spectra, and the smoothness of modified speech is ensured by the shape of event functions. To improve spectral modification, we explore Gaussian mixture model parameters (spectral-GMM parameters) to model the spectral envelope of each event target, and develop a new algorithm for modifying spectral-GMM parameters in accordance with formant scaling factors. We first evaluate the effectiveness of our proposed method in spectra modeling, and then apply it to two areas which require different amounts of spectral modification, emotional speech synthesis and voice gender conversion. Experimental results show that the effectiveness of our proposed method is verified for spectra modeling and spectral modification.
An ultrasonic motor using two bolt-clamped Langevin-type transducers was described. A rigorous optimization of the motor’s structure was conducted and its results are reported in regard to various motor parameters. Based on FEM analysis and experimental results it was established that symmetric and anti-symmetric resonance frequencies could be matched by adjusting the mass of the tip of the motor’s head block. The driving voltage of the motor was reduced by using stacked multi-layered piezo-elements. The velocity of the motor fabricated in this study was more than 1.5 m/s and 25 N in a condition. However, a velocity of less than 100 mm/s could not be achieved using conventional resonance driving. In the case of a velocity lower than 1 mm/s, driving was achieved by “inertial driving.” 1.5 nm resolution was observed using DC driving.
This paper analyzes the error in MUSIC results due to the effect of finite precision arithmetic. Thus, relation of this error to sources correlation level and array and sources configuration parameters is clearly identified. As a result efficient array design algorithm suitable for acoustic environments is derived. This algorithm is efficient in the sense that it can determine minimum number of sensors. This algorithm is quite general as it includes the effect of all parameters such as number of sources, sources correlation level, maximum resolution, maximum source angle, number of sensors, sensor spacing and arithmetic precision. Also this algorithm is shown to be seamlessly applicable in realistic environments where many additional effects and sources of error often exist. During this paper it is shown that this algorithm is indispensable for DOA estimation in wide-band and reverberant environments.
Computational acoustic vision by solving phase ambiguity confusion (CAVSPAC) is proposed for two-dimensional colorful imaging such as pointillisme in the broadband sound environment. The 2D distributions of equivalent point sources were identified as an image from the cross-power spectral phases of sound pressure measured by two pairs of microphones. Each point source was assigned a color corresponding to its frequency. Multiple source locations are introduced from one cross-spectral phase value because of “phase ambiguity” at high frequencies, when the microphone interval is wider than the sound wavelengths. The true source location was extracted from multiple source locations as being the frequency independent. The broadband noise source was visualized with a single two-way loudspeaker set at various positions in the reverberative room. Using CAVSPAC, the 2D image could be identified for the broadband sound source from all directions spherically, except in the area just beside, above and under the microphones. The moderate wider microphone interval than the sound wavelengths led to a better resolution at the source image.
Auditory signals are often used in human-machine interfaces of electric consumer products to inform the user of the state of operation. The signals are expected to enhance the usability of products, especially for older adults who are not accustomed to using such products. Kurakata et al. [Acoust. Sci. & Tech., 29, 176–184 (2008)] reported experimental results related to temporal patterns of auditory signals for electric home appliances on which a Japanese Industrial Standard (JIS S 0013:2002) was based. However, all participants in their experiment were residents of Japan. Therefore, it remains unclear whether the information that the auditory-signal patterns convey can be understood unambiguously by people in other countries who have different cultural backgrounds and who use products that have different interface designs from those sold on the Japanese market. This paper presents results of an experiment in which American, German, and Korean listeners participated and evaluated auditory signals, employing a similar procedure to that of the study by Kurakata et al. By comparing their judgments to those by Japanese listeners, internationally acceptable temporal patterns of auditory signals are proposed.