Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Volume 35, Issue 1
Displaying 1-11 of 11 articles from this issue
TUTORIAL
  • Tsuyoshi Kuroda, Emi Hasuo
    2014 Volume 35 Issue 1 Pages 1-9
    Published: January 01, 2014
    Released on J-STAGE: January 01, 2014
    JOURNAL FREE ACCESS
    This tutorial article aims to guide the reader to learn about three types of psychophysical methods: the method of adjustment, the method of limits and the method of constant stimuli. The article explains how to estimate the point of subjective equality and the just noticeable difference with each method. It also explains about pitfalls that one may encounter when designing experiments. Results could be distorted by some bias effects that are due to technical procedures adopted for experiments. One should know where these effects are likely to occur.
    Download PDF (308K)
INVITED REVIEW
  • Wolfgang Ellermeier, Karin Zimmer
    2014 Volume 35 Issue 1 Pages 10-16
    Published: January 01, 2014
    Released on J-STAGE: January 01, 2014
    JOURNAL FREE ACCESS
    The decrement in memory performance observed while listeners are being exposed to acoustically structured stimuli is called the irrelevant sound effect (ISE). The present review summarizes the research identifying physical features of the irrelevant background that reliably induce performance decrements. It shows that speech, or speech analogues, produce the largest effects by far, suggesting that speech-specific features may contribute to auditory distraction. When an attempt is made to isolate psychoacoustical parameters contributing to the effect, it turns out that noticeable spectral change over time is a necessary condition to observe an ISE, while level change by itself is not. New empirical evidence is presented determining the rate of frequency modulation at which maximal effects are obtained. Results of a further study employing noise-vocoded speech show the importance of spectral detail in producing an ISE. At present, the wealth of empirical findings on the effects of irrelevant sound is not well accounted for by the available theoretical models. Cognitive models make only qualitative predictions, and psychoacoustical models (e.g., those based on fluctuation strength or the speech transmission index) account for subsets of the available data, but have thus far failed to capture the combined effects of temporal structure and spectral change in generating the interference.
    Download PDF (280K)
PAPERS
  • Peng Shen, Satoshi Tamura, Satoru Hayamizu
    2014 Volume 35 Issue 1 Pages 17-27
    Published: January 01, 2014
    Released on J-STAGE: January 01, 2014
    JOURNAL FREE ACCESS
    In this paper, we propose to use exemplar-based sparse representation features for noise robust audio-visual speech recognition. First, we introduce a sparse representation technology and describe how noise robustness can be realized by the sparse representation for noise reduction. Then, feature fusion methods are proposed to combine audio-visual features with the sparse representation. Our work provides new insight into two crucial issues in automatic speech recognition: noise reduction and robust audio-visual features. For noise reduction, we describe a noise reduction method in which speech and noise are mapped into different subspaces by the sparse representation to reduce the noise. Our proposed method can be deployed not only on audio noise reduction but also on visual noise reduction for several types of noise. For the second issue, we investigate two feature fusion methods –- late feature fusion and the joint sparsity model method –- to calculate audio-visual sparse representation features to improve the accuracy of the audio-visual speech recognition. Our proposed method can also contribute to feature fusion for the audio-visual speech recognition system. Finally, to evaluate the new sparse representation features, a database for audio-visual speech recognition is used in this research. We show the effectiveness of our proposed noise reduction on both audio and visual cases for several types of noise and the effectiveness of audio-visual feature determination by the joint sparsity model, in comparison with the late feature fusion method and traditional methods.
    Download PDF (787K)
  • Yasuhito Kawai, Masahiro Toyoda
    2014 Volume 35 Issue 1 Pages 28-34
    Published: January 01, 2014
    Released on J-STAGE: January 01, 2014
    JOURNAL FREE ACCESS
    Noise barriers are often very tall alongside highways with heavy traffic. Although these high barriers ensure the desired amount of noise attenuation, they are expensive to install and have a negative effect on the landscape. Consequently, many types of edge-modified noise barriers have been proposed to reduce the necessary height. Herein an alternative noise barrier based on the ``edge-effect'' suppression technique is proposed, and the sound insulation performance is investigated both theoretically and experimentally. Numerical examples indicate that the diffracted sound is greatly attenuated by suppressing the particle velocity in the region with a large velocity amplitude using a thin absorbing material such as cloth with a gradational distribution in impedance. The experimental and theoretical results of insertion loss are in good agreement, validating the theoretical consideration and effectiveness of the cloth installed at the top of the barrier.
    Download PDF (1043K)
  • Hiroki Matsuzaki, Antoine Serrurier, Pierre Badin, Kunitoshi Motoki
    2014 Volume 35 Issue 1 Pages 35-41
    Published: January 01, 2014
    Released on J-STAGE: January 01, 2014
    JOURNAL FREE ACCESS
    In this paper, we describe a comparison of the acoustic characteristics of one-dimensional and three-dimensional models of vocal tracts with nasal coupling. One-dimensional acoustic propagation is computed using an electric analog model. A finite element method is used for three-dimensional acoustic simulation. The comparison of these two approaches involves the vocal-tract shape of two subjects, one Japanese male and one French male pronouncing the vowel /a/ in their native language. Results show that the pole/zero pairs ascribed to the nasal coupling for both simulations appeared at almost the same frequency, at least below 2 kHz. Little difference between the one-dimensional and three-dimensional simulations in the transfer functions for the French subject is observed, since the three-dimensional mesh for the French subject is smoother. An extra pole exists in the transfer function of the three-dimensional model for the Japanese subject, possibly caused by the asymmetric structure of the laryngeal cavity. In the three-dimensional distribution of the active sound intensity vectors for the French subject, sound energy fluxes circulate between oral and nasal cavities coupled in the vicinity of the lips and nostrils.
    Download PDF (1120K)
  • Tomoko Nariai, Kazuyo Tanaka
    2014 Volume 35 Issue 1 Pages 42-49
    Published: January 01, 2014
    Released on J-STAGE: January 01, 2014
    JOURNAL FREE ACCESS
    This paper presents a comparative analysis of the intensity of words in sentences uttered by Japanese speakers of English and native speakers of English (Japanese English and native English, henceforth). We investigate two parameters: intensity, which is defined here as the integral of the power for each word, and power peak, which is defined as the peak of the power for each word. The analyses reveal differences in both word class and word position. For word class, the nouns, interrogatives and negatives for Japanese English are produced with less intensity, whereas most function words are produced with more intensity than are their native English counterparts. For word position, Japanese English sentence-final words are produced with less intensity than are their native English counterparts. Also, Japanese English sentence-initial words show lower power peak, and the sentence-final words show higher power peak than are their native English counterparts. Detailed analyses reveal a correlation between the above results and subjects' English proficiency. The results for word class can be explained as a result of the Japanese speakers inserting Japanese focus into English utterances. The results for word position are explained such that while sentence-initial strengthening does not affect sentence-initial power peak in Japanese English, an irregularity of final lengthening affects sentence-final intensity.
    Download PDF (493K)
TECHNICAL REPORT
  • Junichi Mori, Fumiaki Satoh, Sakae Yokoyama, Hideki Tachibana
    2014 Volume 35 Issue 1 Pages 50-54
    Published: January 01, 2014
    Released on J-STAGE: January 01, 2014
    JOURNAL FREE ACCESS
    The municipal public address (M.P.A.) system for disaster prevention is an important information facility in communities. The speech intelligibility of such a system, however, tends to be deteriorated by multipass echoes with long time delay owing to reflections from nearby buildings and by the sounds from loudspeakers covering other subareas. When designing such an M.P.A. system, a tool effective for the prediction of outdoor sound propagation should be developed. For this purpose, the authors have been investigating the applicability of a computer modeling technique based on geometrical acoustics. To elucidate the effectiveness of the modeling technique, two case studies were examined by comparing impulse responses (echo diagrams) calculated by computer modeling and those obtained by field measurements. As a result, it was found that this modeling technique can be effectively applied to the prediction of outdoor sound propagation and the basic design of M.P.A. systems.
    Download PDF (764K)
ACOUSTICAL LETTERS
feedback
Top