Journal of the Acoustical Society of Japan (E)

Constraints on computational models of auditory scene analysis, as derived from human perception

Albert S. Bregman

1995Volume 16Issue 3 Pages 133-136
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.133

JOURNAL FREE ACCESS

Show abstractHide abstract

Auditory scene analysis (ASA) groups the sensory evidence derived from a mixture of sound sources and derives separate descriptions of the individual sources. There are two possible ways of doing this:(a) using knowledge of the properties of certain classes of sounds, such as speech, in a top-down fashion;(b) using, in a bottom-up manner, those properties in the input (such as harmonic relations) that are typically found when two or more sound sources (e.g., voice and footsteps) affect the input at the same time. Some properties of the ASA done by humans are discussed and proposed as specifications for a computational model: the adding up of evidence, the coherence of the ASA system, the combining of top-down and bottom-up constraints, as illustrated in the duplex perception of speech, and the propagation of influences across the auditory field.

View full abstract

Download PDF (608K)
A system-theoretical evaluation method for the reverberation time of an acoustically coupled room system

Mitsuo Ohta, Hiroshi Yamada, Hirofumi Iwashige

1995Volume 16Issue 3 Pages 137-145
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.137

JOURNAL FREE ACCESS

Show abstractHide abstract

A reverberation time is very important for evaluating the reverberation characteristics of a room and the sound power of an acoustic source. The definition of reverberation time is given originally for an acoustically isolated single room. On the other hand, in an acoustically coupled room, respective reverberation times of individual rooms are fairly different from independently measured reverberation times without mutual power flow interactions through apertures or acoustic insulation walls. In this paper, first, a general type characteristic expression of acoustically coupled rooms is established as a power flow system consisting of several subsystems, in close connection with a Statistical Energy Analysis method, corresponding to a kind of system equation in a state estimation theory. On the basis of a solution of this system equation, new type six different methods are proposed on trial by introducing various type evaluation criteria so as to evaluate the reverberation time of the acoustically coupled room system, especially from various type system-theoretical viewpoints. Finally, the effectiveness of the proposed theory is experimentally confirmed by applying it to the actual coupled acoustic rooms and the above six evaluation methods are compared with each other.

View full abstract

Download PDF (1259K)
A novel spotting-based approach to continuous speech recognition: Minimum error classification of keyword-sequences

Takashi Komori, Shigeru Katagiri

1995Volume 16Issue 3 Pages 147-157
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.147

JOURNAL FREE ACCESS

Show abstractHide abstract

To overcome the lack of theoretical basis of a fundamental, word spotting-based approach to the recognition of natural, spontaneous speech utterances, we propose in this paper a novel spotter (spotting system) design method referred to as Minimum Error Classification of Keyword-sequences (MECK). A key concept of the method is to formalize the entire spotting process as a trainable functional form with the design objective being the keyword-sequence (a string of prescribed keyword categories) classification accuracy. A resulting MECK procedure allows one to design spotters in an efficient way of using only pairs of utterances and their corresponding phonemic transcriptions (not requiring hand-segmented labels) as well as in a mathematically-proven way consistent with the error minimization of the keyword-sequence classification. MECK is quite general and can be applied to any reasonable spotter structure. The paper specially presents implementation details for a prototype-based spotter and demonstrates the utility of this MECK-trained spotter in several Japanese keyword spotting tasks.

View full abstract

Download PDF (1725K)
A correction of the insertion-loss for constant sound pressure with flow

Tsuyoshi Nishimura, Tsuyoshi Usagawa, Masanao Ebata

1995Volume 16Issue 3 Pages 159-164
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.159

JOURNAL FREE ACCESS

Show abstractHide abstract

The insertion-loss is evaluated by B parameter of the four-terminal matrix method as far as constant sound pressure source is concerned. However, the predictions using the equations in the four-terminal transmission matrix method do not reflect a practical phenomenon accurately. In this paper, the correction method to derive the insertionloss based on the Characteristic Curve Method is presented. Correction of the fourterminal transmission matrix method was carried out by rewriting the real and imaginary parts as they depend solely on the flow velocity. Then the result was compensated for by adding the component of the temperature gradient.

View full abstract

Download PDF (830K)
Detection of unknown words in large vocabulary speech recognition

Satoru Hayamizu, Katunobu Itou, Kazuyo Tanaka

1995Volume 16Issue 3 Pages 165-171
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.165

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes the relation between vocabulary sizes and detection errors of unknown words in large vocabulary speech recognition through recognition and detection experiments. Although the relation between vocabulary sizes and recognition performances has been reported, the relation between vocabulary sizes and detection performances has not yet been studied. Especially, it has not for the cases of vocabulary sizes of over 1, 000 words. Experiments were conducted using the speech material of speaker MAU's ATR word speech database. The entries of the dictionary used is 40, 000 words from the Shinmeikai Japanese Language Dictionary. It is shown that when the vocabulary size increases from 1, 000 words to 40, 000 words, the relation between vocabulary sizes and detection errors has a similar tendency with the relation between vocabulary sizes and recognition errors. And increases of detection errors caused by increases of vocabulary sizes are shown to be small for the case of within vocabulary, compared with increases of detection errors for the case of out of vocabulary. These results should be taken into accounts in designing large vocabulary speech recognition systems including unknown word processing.

View full abstract

Download PDF (986K)
Sound reduction by a T-profile noise barrier

Masaki Hasebe

1995Volume 16Issue 3 Pages 173-179
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.173

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a study of the sound field diffracted by a T-profile noise barrier. To estimate the sound pressure behind the T-profile noise barrier, a calculation model was proposed. To verify the calculation model, experiments were conducted indoors using a reduce-scale noise barrier and outdoors using a full-scale noise barrier. Using the calculation model, the characteristics of T-profile noise barriers for noise reduction were estimated, including the effect of an absorbent material on the cap of the barrier.

View full abstract

Download PDF (781K)
Longitudinal-torsional hybrid transducer with a longitudinal resonance tuning electric port

Yoshikazu Koike, Nobuo Fujihara, Kentaro Nakamura, Sadayuki Ueha

1995Volume 16Issue 3 Pages 181-183
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.181

JOURNAL FREE ACCESS

Show abstractHide abstract

A longitudinal-torsional hybrid transducer with a longitudinal resonance frequency tuning electric port was proposed. The longitudinal resonance frequency can be adjusted independently of the torsional one by changing the external electrical impedance. This new technique can be applied to ultrasonic processing tools and ultrasonic motors.

View full abstract

Download PDF (335K)
An estimation method of L_x for arbitrary random noises based on the limited fluctuation level range

Hideo Minamihara, Mitsuo Ohta, Masafumi Nishimura, Yoshiaki Takakuwa

1995Volume 16Issue 3 Pages 185-188
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.185

JOURNAL FREE ACCESS

Download PDF (454K)
Acoustic levitation of planar objects using a longitudinal vibration mode

Yoshiki Hashimoto, Yoshikazu Koike, Sadayuki Ueha

1995Volume 16Issue 3 Pages 189-192
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.189

JOURNAL FREE ACCESS

Download PDF (601K)
A study of non-linear effect on acoustic impulse response measurement

Yutaka Kaneda

1995Volume 16Issue 3 Pages 193-195
Published: 1995
Released on J-STAGE: February 17, 2011

DOIhttps://doi.org/10.1250/ast.16.193

JOURNAL FREE ACCESS

Download PDF (317K)

Register with J-STAGE for free!