Acoustical Science and Technology

INVITED REVIEWS

Nonnegative matrix factorization based on complex generative model

Daichi Kitamura

2019Volume 40Issue 3 Pages 155-161
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.155

JOURNAL FREE ACCESS

Show abstractHide abstract

Nonnegative matrix factorization (NMF) is a powerful technique of extracting meaningful patterns from an observed matrix and has been used for many applications in the audio signal processing field. In this article, the principle of NMF and some extensions based on a complex generative model are reviewed. Also, their application to audio source separation is presented.

View full abstract

Download PDF (1657K)
Speech enhancement using harmonic-structure-based phase reconstruction

Yukoh Wakabayashi

2019Volume 40Issue 3 Pages 162-169
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.162

JOURNAL FREE ACCESS

Show abstractHide abstract

Recent work has shown that phase information is useful for further improving the performance of speech enhancement, source separation, and speech synthesis. In the speech enhancement field, the combination of amplitude and phase estimations improves the perceived quality more than only amplitude estimation. In this paper, we review two harmonic-structure-based phase estimation methods with temporal and frequency constraints on the harmonic speech phase. In addition, we describe important parameters for phase estimation, such as the frame shift length and window function of the short-time Fourier transform. Subjective experiments using listening tests and future work for phase processing are briefly described.

View full abstract

Download PDF (1122K)
Representation of complex spectrogram via phase conversion

Kohei Yatabe, Yoshiki Masuyama, Tsubasa Kusano, Yasuhiro Oikawa

2019Volume 40Issue 3 Pages 170-177
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.170

JOURNAL FREE ACCESS

Show abstractHide abstract

As importance of the phase of complex spectrogram has been recognized widely, many techniques have been proposed for handling it. However, several definitions and terminologies for the same concept can be found in the literature, which has confused beginners. In this paper, two major definitions of the short-time Fourier transform and their phase conventions are summarized to alleviate such complication. A phase-aware signal-processing scheme based on phase conversion is also introduced with a set of executable MATLAB functions (https://doi.org/10/c3qb).

View full abstract

Download PDF (322K)

PAPERS

Modeling of angklung to determine its pitch frequency

Pepen Arifin, Idham Pribadi

2019Volume 40Issue 3 Pages 178-185
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.178

JOURNAL FREE ACCESS

Show abstractHide abstract

The angklung is an Indonesian traditional musical instrument made entirely of bamboo. It usually consists of two or three rattle tubes that generate sound by vibrating the tubes. The generated sound is resonated by a rattle resonance tube to make it louder. The rattle tube is carved in a traditional way from a piece of bamboo with a certain length and diameter that are passed from generation to generation to produce the desired tone. In this investigation, we develop a mathematical model of sound generation by a rattle tube and formulate an equation for the frequency of the vibrated rattle tube from its physical and geometrical parameters. Since the rattle tube is not perfectly cylindrical, the frequency of the vibrated rattle tube is derived from the frequency equation for a perfectly cylindrical tube with a modification of the geometrical parameters to make them appropriate for the shape of the rattle tube. This equation can determine the tone frequency for given geometrical parameters of the tube and explain the relationship between the generated tone frequency and the resonant frequency. The model also shows that the discrepancy between the calculated and generated frequencies of the rattle tube is within the response of human ears.

View full abstract

Download PDF (545K)
Modal decomposition of musical instrument sounds via optimization-based non-linear filtering

Yoshiki Masuyama, Tsubasa Kusano, Kohei Yatabe, Yasuhiro Oikawa

2019Volume 40Issue 3 Pages 186-197
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.186

JOURNAL FREE ACCESS

Show abstractHide abstract

For musical instrument sounds containing partials, which are referred to as modes, the decaying processes of the modes significantly affect the timbre of musical instruments and characterize the sounds. However, their accurate decomposition around the onset is not an easy task, especially when the sounds have sharp onsets and contain the non-modal percussive components such as the attack. This is because the sharp onsets of modes comprise peaky but broad spectra, which makes it difficult to get rid of the attack component. In this paper, an optimization-based method of modal decomposition is proposed to overcome it. The proposed method is formulated as a constrained optimization problem to enforce the perfect reconstruction property which is important for accurate decomposition and causality of modes. Three numerical simulations and application to the real piano sounds confirm the performance of the proposed method.

View full abstract

Download PDF (2037K)
Reversal of relationship between impression of voice pitch and height of fundamental frequency: Its appearance and disappearance

Teruhisa Uchida

2019Volume 40Issue 3 Pages 198-208
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.198

JOURNAL FREE ACCESS

Show abstractHide abstract

This study investigated the cognitive biases related to the impression of voice pitch caused by changes in tonal quality. According to the vocal tube model, changing the vocal-tract length (VTL) systematically alters the tonal quality. In one experiment, the fundamental frequency (f_o) of the speech samples was raised and lowered on a mel-scale axis. Then the spectral-frequency scale was expanded and contracted to simulate reducing and increasing the VTL. In a second experiment, the width of the f_o range was changed in addition to changing the f_o height and VTL scaling. Noise-vocoded speech samples were generated to measure the independent effects of the VTL scaling. The participants rated their impressions of the pitch using paired comparison. The results revealed a reversal of the relationship between impression of voice pitch and height of f_o when the effects of f_o height and VTL scaling on pitch impression were opposite to each other and when the range of the f_o contour was equivalent to that of natural speech. VTL scaling played a dominant role in this reversal. However, as the f_o contour became flat, this reversal phenomenon disappeared, and the f_o height factor came to play the dominant role.

View full abstract

Download PDF (1339K)
Vector-sensor array direction-of-arrival estimation exploiting spatial time-frequency structure based on joint approximate diagonalization

Hai-Yan Song, Chang-Yi Yang, Ke-Jun Wang

2019Volume 40Issue 3 Pages 209-216
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.209

JOURNAL FREE ACCESS

Show abstractHide abstract

By making use of the extra particle velocity information, an array of vector sensors can achieve better Direction-of-arrival (DOA) estimation performance than a conventional array of pressure sensors. However, it is noted that most of the previous work on DOA estimation with vector-sensor array uses only the time-space statistical information available on the array signals and does not exploit the difference in the time-frequency signatures of the sources. In this paper, we develop a new approach which exploits the inherent time-frequency-space characteristics of the underlying vector-sensor array signal to achieve better DOA estimation performance even in a noisy and coherent environment with few snapshots. It turns out that our approach is based on the spatial time-frequency distributions (STFD) information and can efficiently combine all of the relevant STFD points by the joint approximate diagonalization approach, such as Jacobi rotation, to reduce the effect of noise and achieve the desired angular resolution. Computer simulations with several frequently encountered scenarios, such as multiple closely spaced coherent sources, indicate the superior DOA estimation resolution of our proposed approach as compared with existing techniques. In addition, from a statistical point of view, the performance of our proposed approach is investigated more closely by considering the root mean square error (RMSE) respectively versus SNRs, snapshots, or number of sensors and its excellent performance for higher DOA estimation accuracy is demonstrated.

View full abstract

Download PDF (684K)

ACOUSTICAL LETTERS

Acoustic spatial radiation characteristics of a violin made of Japanese cedar wood produced in Nara Prefecture

Katuhiro Maki, Maiko Ariyama

2019Volume 40Issue 3 Pages 217-220
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.217

JOURNAL FREE ACCESS

Download PDF (590K)
Note on microperforated panel model using equivalent-fluid-based absorption elements

Takeshi Okuzono, Takao Nitta, Kimihiro Sakagami

2019Volume 40Issue 3 Pages 221-224
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.221

JOURNAL FREE ACCESS

Download PDF (616K)
Aging-related attention deficits in frequency discrimination amid task-irrelevant stimulus differences

Blas Espinoza-Varas, Praveen Jajoria, Hyunsook Jang

2019Volume 40Issue 3 Pages 225-228
Published: May 01, 2019
Released on J-STAGE: May 01, 2019

DOIhttps://doi.org/10.1250/ast.40.225

JOURNAL FREE ACCESS

Download PDF (240K)

Register with J-STAGE for free!