Acoustical Science and Technology

PAPERS

A new algorithm for blind estimation of common poles in multiple transmission paths based on linear prediction

Takafumi Hikichi, Masato Miyoshi

2005 年 26 巻 1 号 p. 1-7
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.1

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper proposes a blind calculation method for the poles common to multiple signal transmission paths. In the field of room acoustics, the poles correspond to the mode frequencies that are determined by room size and shape, and they do not change when source and receiver locations change. Information on these acoustic poles is useful for many applications, including echo cancellation and sound field equalization in a room. Conventional pole estimation methods require a priori measurement of the room transfer functions. This paper proposes a new method for the blind calculation of the poles, where the poles are calculated solely from the observed signals. Simulation results show that the proposed algorithm provides precise estimates of the common poles.

抄録全体を表示

PDF形式でダウンロード (197K)
Frequency domain adaptive algorithm with nonlinear function of error-to-reference ratio for double-talk robust echo cancellation

Suehiro Shimauchi, Yoichi Haneda, Akitoshi Kataoka

2005 年 26 巻 1 号 p. 8-15
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.8

ジャーナルフリー

抄録を表示する抄録を非表示にする

Several adaptive algorithms for robust echo cancellation use nonlinear reference and/or error functions. Most of them require time-variant threshold estimators, e.g., noise level estimators or double-talk detectors, since their nonlinearities have to be adjusted in response to changes in near-end noise or speech signal levels. We propose a new frequency domain adaptive algorithm: the gradient-limited fast least-mean-squares (GL-FLMS), in which the coefficients are updated by using a nonlinear function of the error scaled by the reference magnitude, i.e., the error-to-reference ratio (ERR). When the acoustic coupling level between loudspeaker and microphone is bounded, the ERR is also bounded in the case of single-talk, but may increase during double-talk. The GL-FLMS limits unexpected increases in the ERR with fixed thresholds and prevents divergence of the coefficients, while not neglecting updates to adjust when a large reference signal introduces a large error during single-talk.

抄録全体を表示

PDF形式でダウンロード (352K)
Individual variation of the hypopharyngeal cavities and its acoustic effects

Tatsuya Kitamura, Kiyoshi Honda, Hironori Takemoto

2005 年 26 巻 1 号 p. 16-26
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.16

ジャーナルフリー

抄録を表示する抄録を非表示にする

Morphological measurements of the hypopharynx are conducted to investigate the correlation between fine structures of the vocal tract and speaker characteristics. The hypopharynx includes the laryngeal tube and bilateral cavities of the piriform fossa. MRI data during sustained phonation of the five Japanese vowels by four subjects are obtained to analyze intra- and inter-speaker variation of the hypopharynx. Morphological analysis on the mid-sagittal and transverse planes revealed that the shape of the hypopharynx was relatively stable, regardless of vowel type, in contrast to relatively large inter-speaker variation, and these results are confirmed quantitatively by a simple similarity method. The small intra-speaker variation of the hypopharynx is confirmed by further morphological analysis using high-quality MRI data for one of the subjects, obtained by using the “phonation-synchronized method” and “custom laryngeal coil.” Furthermore, acoustical effects of the individual variation of the hypopharynx are estimated by using a transmission line model. Vocal tract area function of one of the subjects above the hypopharynx is combined with the hypopharyngeal cavities of other subjects, and their transfer functions are calculated. The results show that the inter-speaker variation of the hypopharynx affects spectra in the frequency range beyond approximately 2.5 kHz.

抄録全体を表示

PDF形式でダウンロード (477K)
Durational shrinkage by noise replacement in quasi-isochronous and hyper-isochronous contexts

Minoru Tsuzaki, Hiroaki Kato

2005 年 26 巻 1 号 p. 27-34
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.27

ジャーナルフリー

抄録を表示する抄録を非表示にする

When a portion of a sound is replaced by a noise burst, its duration is perceived to be shorter than that of its intact counterpart. To test the robustness of this shrinking effect by noise replacement and to validate the hypothesis that duration can be estimated as a function of accumulated perceptual evidence for the target sound, the shrinking effect was investigated with tonal stimuli in two contextual temporal structures. Two experiments are conducted using (1) a tone with an envelope pattern copied from a naturally spoken word, and (2) an isochronous sequence of four tones. In most cases, the noise replacement causes the perceived duration of the target tone to shrink from that of its intact counterpart. However, a reversal/prolongation tendency by noise was observed for the stimulus with a deviation slightly shorter than an isochronous structure in the second experiment. Although this reversal tendency partially supports the hypothesis that a noise merely enhances a contextual effect (the contextual enhancement hypothesis), the shrinking effect observed under the other conditions was difficult to explain by the contextual enhancement hypothesis. The shrinking effect could be explained in a framework of the traditional neural counting mechanisms with one additional mechanism to control the degree of gate opening depending on the perceptual evidence of the target sound.

抄録全体を表示

PDF形式でダウンロード (184K)
Effects of deviation from isochronism on the durational shrinkage by noise replacement

Minoru Tsuzaki, Hiroaki Kato

2005 年 26 巻 1 号 p. 35-42
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.35

ジャーナルフリー

抄録を表示する抄録を非表示にする

The duration of sounds generally tends to be perceived as shorter when a portion is replaced by a noise burst. However, a reversal/prolongation tendency can occur if a compelling isochronous context is functioning. To test the robustness of the durational shrinkage as well as to investigate what aspect is the core feature providing the isochronism, three experiments are conducted using (1) a non-isochronous sequence of four tones, (2) a four-tone sequence whose interonset intervals fluctuate randomly, and (3) a four-tone sequence whose interonset intervals are fixed to be isochronous irrespective of adjustment by human participants in the experiment. In most cases, the noise replacement causes the perceived duration of the target tone to shrink compared to that of its intact counterpart. Furthermore, the reduction of isochronous context results in the reduction of the reversal tendency, although the shrinking effect cannot be observed clearly either. The effects of noise replacement and context are discussed in relation to the contribution of local cues provided by the perceptual evidence as well as the contribution of a global cue provided by an isochronous interonset interval.

抄録全体を表示

PDF形式でダウンロード (244K)
Efficient two-stage vector quantization speech coder using wavelet coefficients of excitation signals

Seiji Hayashi, Masahiro Suguimoto, Erinnoviar

2005 年 26 巻 1 号 p. 43-49
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.43

ジャーナルフリー

抄録を表示する抄録を非表示にする

An improved backward prediction coder featuring two-stage vector quantization (VQ) of shape codevectors is presented. Efficient two-stage VQ is achieved using the wavelet coefficients of excitation signals; i.e., wavelet coefficients are calculated by applying a discrete wavelet transform to excitation signals, and the results are divided into an approximation group and a detail group. The data lengths of both approximation and detail coefficients are half that of conventional two-stage VQ systems. Simulation results show that the proposed coder achieves a better weighted signal-to-noise ratio (WSNR) than conventional coders and, in terms of reconstructed speech quality, ranks between the FS-1016 Code Excited Linear Prediction (CELP) coder and the Vector Sum Excited Linear Predictive Coding (VSELP) coder.

抄録全体を表示

PDF形式でダウンロード (158K)
Detection threshold for distortions due to jitter on digital audio

Kaoru Ashihara, Shogo Kiryu, Nobuo Koizumi, Akira Nishimura, Juro Ohga ...

2005 年 26 巻 1 号 p. 50-54
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.50

ジャーナルフリー

抄録を表示する抄録を非表示にする

Detection threshold for distortions due to time jitter was measured in a 2 alternative forced choice paradigm with switching sounds. Music signals with random jitter were simulated on the digital domain. The size of jitter was arbitrary controlled so that the detection threshold could be estimated. Professional audio engineers, sound engineers, audio critics and semi-professional musicians participated as listeners. The listeners were allowed to use their own listening environments and their favorite sound materials. It was shown that the detection threshold for random jitter was several hundreds ns for well-trained listeners under their preferable listening conditions.

抄録全体を表示

PDF形式でダウンロード (82K)
Nonlinearly generated second harmonic sound in a focused beam reflected from free surface

Shigemi Saito

2005 年 26 巻 1 号 p. 55-61
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.55

ジャーナルフリー

抄録を表示する抄録を非表示にする

By providing a phase reversal at the focal region, another phase reversal due to diffraction is compensated and as a result, an enhancement of the second harmonic generation is expected. To attain this, the reflection of the focused beam at a water surface being set in the focal region is experimentally observed using a focusing source to receive the reflected second harmonic sound by itself, which employs a LiNbO₃ plate with a ferroelectric inversion layer. The experimental result is compared with the theoretical calculation based on the Khokhlov-Zabolotskaya-Kuznetsov equation, where the condition of the phase reversal for both the fundamental and second harmonic components is assumed to be at the free surface. The experimental result agrees reasonably well with the predicted increase in the second harmonic amplitude by 2.0 times. Since this growth rate is sensitive to the velocity dispersion that occurs in different liquid media, such a measurement of second harmonic component may be potentially useful for estimating the dispersion.

抄録全体を表示

PDF形式でダウンロード (147K)

TECHNICAL REPORT

The present status, progress, and usage of speech databases in Japan

Hisao Kuwabara, Shuich Itahashi, Mikio Yamamoto, Satoshi Nakamura, Tos ...

2005 年 26 巻 1 号 p. 62-66
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.62

ジャーナルフリー

抄録を表示する抄録を非表示にする

The present status, progress and usage of Japanese speech database has been described. The database project in Japan started in the early 1980s. The first was by the Japan Electronic Industry Development Association (JEIDA), which aimed at creating a speech database to evaluate performance of the existing speech input/output machines and systems. Several database projects have been undertaken since then, including the one initiated by the Advanced Telecommunication Research Institute (ATR), and now we have reached a point where an enormous amount of spontaneous speech data is available. A survey was conducted recently on usage of the presently existing speech databases among industry and university institutions in Japan where speech research is now actively going on. It was revealed that the ATR’s continuous speech database is the most frequently used, followed by the equivalent version from the Acoustical Society of Japan.

抄録全体を表示

PDF形式でダウンロード (57K)

ACOUSTICAL LETTERS

Transmission characteristics of ear canal of artificial head

Kiyoshi Sugiyama, Mitsugu Nishimoto, Machiko Satoh

2005 年 26 巻 1 号 p. 67-70
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.67

ジャーナルフリー

PDF形式でダウンロード (148K)
On sound spectral model of road vehicle for prediction of road traffic noise: Considerations for establishing the ASJ RTN-Model 2003

Teruo Iwase, Kunio Nakasaki, Yoshiharu Namikawa, Teiji Mori

2005 年 26 巻 1 号 p. 71-75
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.71

ジャーナルフリー

PDF形式でダウンロード (459K)
Differential travel time series of the reciprocal transmission in 1999 ocean acoustic tomography data

Yong Wang, Hiroyuki Hachiya

2005 年 26 巻 1 号 p. 76-78
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.76

ジャーナルフリー

PDF形式でダウンロード (176K)
Reproduced sound pressure level yielding the maximum auditory presence: Further study on effects of reproduced SPLs on auditory presence

Mayuko Mitsuki, Manabu Miyasaka, Kenji Ozawa

2005 年 26 巻 1 号 p. 79-81
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.79

ジャーナルフリー

PDF形式でダウンロード (131K)
The responses of neurons in the gerbil inferior colliculus to virtual acoustic space stimuli

Katuhiro Maki, Shigeto Furukawa

2005 年 26 巻 1 号 p. 82-84
発行日: 2005年
公開日: 2005/01/01

DOIhttps://doi.org/10.1250/ast.26.82

ジャーナルフリー

PDF形式でダウンロード (229K)

J-STAGEへの登録はこちら（無料）