Acoustical Science and Technology

PAPERS

Effect of horizontal sound-absorptive strips inside closed rooms

Takumi Asakura, Wataru Yashima, Fumiaki Satoh

2020Volume 41Issue 5 Pages 709-719
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.709

JOURNAL FREE ACCESS

Show abstractHide abstract

The sound absorption characteristics obtained with horizontally arranged sound-absorptive strips on walls were evaluated by objective measure based on the acoustical indices determined from the impulse responses calculated by finite-difference time-domain simulation. The subjective effect of the horizontal sound-absorptive strips (HSSs) was also investigated by subjective measure based on Scheffe's paired comparison method. The results of the numerical case study confirmed that the frequency characteristics of the acoustic indices of rooms with the HSSs significantly changed under the influence of the relative positional relationship between the source and receiving points and the arrangement height of the strips. Through a subjective evaluation experiment, the differences in the absorption effect of various types of settings of the strips on the reverberation inside rooms were also clarified.

View full abstract

Download PDF (1414K)
Vocal-tract spectrum estimation method affects the articulatory compensation in formant transformed auditory feedback

Yasufumi Uezu, Sadao Hiroya, Takemi Mochida

2020Volume 41Issue 5 Pages 720-728
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.720

JOURNAL FREE ACCESS

Show abstractHide abstract

Auditory feedback has a crucial role in stably controlling speaking and singing. Formant-transformed auditory feedback (TAF) is used to investigate the relationship between perturbation to the formant frequency and the compensatory response to clarify the mechanism of auditory-speech motor control. Although previous studies for formant TAF applied linear predictive coding (LPC) to estimate formant frequencies, LPC estimates false formants for high-pitch voice. In this paper, we investigate how different vocal-tract spectrum estimation methods in real-time formant TAFs affect the compensatory response of formant frequencies to perturbations. A phase equalization-based autoregressive exogenous model (PEAR) is applied to the TAF system as a formant estimation method that can estimate the formant frequency more accurately and robustly than LPC can. Fifteen Japanese native speakers were asked to repeat the Japanese syllables /he/ or /hi/ while receiving feedback sounds whose formants F1 and F2 were transformed. From the results for the /he/ condition, the F1 compensatory response for PEAR was significantly larger than that of LPC, and the compensation error in the F1–F2 plane for PEAR was less than that for LPC. Our results suggest that PEAR can increase both the accuracy of formant frequency estimation and the naturalness of the transformed speech sound.

View full abstract

Download PDF (1001K)
Pilot study on acceptable sound levels in scenic areas

Koji Nagahata, Hayato Akanuma, Akina Oka, Yukako Shoji

2020Volume 41Issue 5 Pages 729-738
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.729

JOURNAL FREE ACCESS

Show abstractHide abstract

The intrusion of road traffic noise in scenic areas is one of the key issues in managing acoustic quality. Several studies focused on acceptable sound levels for road traffic noise in such areas; however, most of them estimated acceptable sound levels from the dose-response relationship between sound levels and annoyance or evaluation of acoustic comfort, and few studies investigated acceptable sound levels directly. We directly investigated the acceptable sound levels for road traffic noise in scenic areas in Japan by conducting psycho-acoustic experiments involving a group of participants. Two simulated road traffic noises were used as target sounds, and four audio and video recordings were used as background conditions. By a method of adjustment, the participants were required to adjust the playback level of each target to a maximum acceptable level while comparing the background sound levels. The results showed that the acceptable sound levels cannot be explained by a simple value or a simple signal-to-noise ratio (SNR). There is a clear tendency that a higher SNR, which means that road traffic noise can be heard more clearly, is acceptable in a quieter area. The acceptable sound levels of scenic areas are largely dependent on the evaluators and features of the areas.

View full abstract

Download PDF (841K)
Analysis of flow and acoustic radiation in reed instruments by compressible flow simulation

Hiroshi Yokoyama, Masanori Kobayashi, Akiyoshi Iida

2020Volume 41Issue 5 Pages 739-750
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.739

JOURNAL FREE ACCESS

Show abstractHide abstract

Direct aeroacoustic simulations of flow and sound around an instrument with an oscillating reed were performed on the basis of compressible Navier–Stokes equations along with experiments with an artificial blowing device. The measured reed displacement was utilized as forced vibration in the computations. The predicted sound pressure spectrum shows that the level of the fundamental tone almost agrees with the measured result. The numerical results showed that the lowest acoustic mode of clarinet-type reed instruments (one-quarter wavelength mode) was reproduced. Moreover, the sound generation mechanism was discussed in detail using the predicted gradient of mass flow rate in the instrument. It was found that compression and expansion occur inside the mouthpiece, where the flow separation occurs after the spreading of the air jet from the reed channel exit along the inner wall of the mouthpiece. In addition, vortex ring shedding attributable to the acoustic particle velocity around the open end of the instrument was found to occur, causing an expansion wave from the instrument.

View full abstract

Download PDF (2071K)
Perception of Japanese length contrasts with reverberation by native and nonnative listeners

Eri Osawa, Takayuki Arai, Nao Hodoshima

2020Volume 41Issue 5 Pages 751-760
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.751

JOURNAL FREE ACCESS

Show abstractHide abstract

The perception of segmental duration is crucial for the distinction of Japanese length contrasts. However, the perceived duration may be changed in a long reverberation, which adds a ``tail'' to sounds, making them perceived as being longer. In addition, since lengthened sounds overlap the following sounds, the boundaries of phonemes would become blurred. In the current study, we investigated whether the effects of reverberation distort the distinction of Japanese length contrasts for native Japanese and English listeners. Stimuli were nonword pairs (/baba/–/babaa/, /ata/–/atta/, and /ama/–/amma/) varying in duration along the continuum. The logistic function was used to model the perception. In the distinction of vowel length contrast in the word-final position, even native listeners identified the stimulus with the shortest vowel duration as a long vowel word with reverberation. Regarding the perception of the geminate nasal, ``geminate'' responses increased with reverberation for native listeners, whereas the results for nonnative listeners indicated that ``singleton'' responses increased with reverberation. It is assumed that the difference could be attributed to the different prototypes of categories of Japanese between native and nonnative listeners. In addition, the results for nonnative listeners might be attributed to the difference in prosody between English and Japanese.

View full abstract

Download PDF (754K)

TECHNICAL REPORTS

JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research

Shinnosuke Takamichi, Ryosuke Sonobe, Kentaro Mitsui, Yuki Saito, Tomo ...

2020Volume 41Issue 5 Pages 761-768
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.761

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we develop two corpora for speech synthesis research. Thanks to improvements in machine learning techniques, including deep learning, speech synthesis is becoming a machine learning task. To accelerate speech synthesis research, we aim at developing Japanese voice corpora reasonably accessible from not only academic institutions but also commercial companies. In this paper, we construct the JSUT and JVS corpora. They are designed mainly for text-to-speech synthesis and voice conversion, respectively. The JSUT corpus contains 10 hours of reading-style speech uttered by a single speaker, and the JVS corpus contains 30 hours containing three styles of speech uttered by 100 speakers. This paper describes how we designed the corpora and summarizes the specifications. The corpora are available at our project pages.

View full abstract

Download PDF (1671K)
Effect of spectrogram resolution on deep-neural-network-based speech enhancement

Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Ha ...

2020Volume 41Issue 5 Pages 769-775
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.769

JOURNAL FREE ACCESS

Show abstractHide abstract

In recent single-channel speech enhancement, deep neural network (DNN) has played a quite important role for achieving high performance. One standard use of DNN is to construct a mask-generating function for time-frequency (T-F) masking. For applying a mask in T-F domain, the short-time Fourier transform (STFT) is usually utilized because of its well-understood and invertible nature. While the mask-generating regression function has been studied for a long time, there is less research on T-F transform from the viewpoint of speech enhancement. Since the performance of speech enhancement depends on both the T-F mask estimator and T-F transform, investigating T-F transform should be beneficial for designing a better enhancement system. In this paper, as a step toward optimal T-F transform in terms of speech enhancement, we experimentally investigated the effect of parameter settings of STFT on a DNN-based mask estimator. We conducted the experiments using three types of DNN architectures with three types of loss functions, and the results suggested that U-Net is robust to the parameter setting while that is not the case for fully connected and BLSTM networks.

View full abstract

Download PDF (738K)

ACOUSTICAL LETTERS

Characterization of intense pressure pulse through cylindrical hole

Koji Aizawa, Takumi Kobayashi

2020Volume 41Issue 5 Pages 776-779
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.776

JOURNAL FREE ACCESS

Download PDF (409K)
Assessing sensorimotor integration in adults who stutter by a behavioral task using perceptual adaptation of frequency-altered auditory feedback

Daichi Iimura, Nobuhiko Asakura, Takafumi Sasaoka, Toshio Inui

2020Volume 41Issue 5 Pages 780-783
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.780

JOURNAL FREE ACCESS

Download PDF (346K)
Diffuse-field sound absorption characteristics of a spherical-microperforated space absorber

Kimihiro Sakagami, Midori Kusaka, Takeshi Okuzono, Shigeyuki Kido, Dai ...

2020Volume 41Issue 5 Pages 784-787
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.784

JOURNAL FREE ACCESS

Download PDF (747K)
Language modeling in speech recognition for grammatical error detection based on neural machine translation

Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

2020Volume 41Issue 5 Pages 788-791
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.788

JOURNAL FREE ACCESS

Download PDF (394K)
Hand clapping in a group with external timing cue of clap sounds

Katuhiro Maki, Nao Ota

2020Volume 41Issue 5 Pages 792-795
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.792

JOURNAL FREE ACCESS

Download PDF (426K)
Pitch and duration as auditory cues to identify Japanese long vowels for Japanese learners

C. T. Justine Hui, Takayuki Arai

2020Volume 41Issue 5 Pages 796-799
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.796

JOURNAL FREE ACCESS

Download PDF (627K)
Classification of formant estimation methods in transformed auditory feedback experiments using convolutional neural networks

Fumiaki Taguchi, Sadao Hiroya, Yasufumi Uezu, Takemi Mochida

2020Volume 41Issue 5 Pages 800-803
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.800

JOURNAL FREE ACCESS

Download PDF (596K)
Perception analysis of inter-singer similarity in Japanese song

Hiroki Tamaru, Shinnosuke Takamichi, Hiroshi Saruwatari

2020Volume 41Issue 5 Pages 804-807
Published: September 01, 2020
Released on J-STAGE: September 01, 2020

DOIhttps://doi.org/10.1250/ast.41.804

JOURNAL FREE ACCESS

Download PDF (625K)

Register with J-STAGE for free!