Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Volume 41, Issue 2
Displaying 1-9 of 9 articles from this issue
INVITED REVIEWS
  • Tomoki Koriyama
    2020 Volume 41 Issue 2 Pages 457-464
    Published: March 01, 2020
    Released on J-STAGE: March 01, 2020
    JOURNAL FREE ACCESS

    Gaussian process (GP) is a distribution of functions, which can be used for a machine learning framework. GP regression has characteristics of Bayesian model, which can predict uncertainty of outputs, and kernel methods, which enables nonlinear function with a small number of parameters. In this paper, we first describe the basic of GP regression, and introduce recent notable advances of GP. Specifically, we focus on stochastic variational GP that is an approximation method available for a huge amount of training data, and explain a GP-based deep architecture model called deep Gaussian process. Since GP regression is a general-purpose machine learning framework, there are many applications. In this paper, we introduce GP-based applications to speech information processing including speech synthesis.

    Download PDF (248K)
  • Ryo Aihara, Gordon Wichern, Jonathan Le Roux
    2020 Volume 41 Issue 2 Pages 465-471
    Published: March 01, 2020
    Released on J-STAGE: March 01, 2020
    JOURNAL FREE ACCESS

    The recently-proposed deep clustering algorithm introduced significant advances in single-channel speaker-independent multi-speaker speech separation. In this paper, we review deep clustering and its improved method called chimera net. In addition, we describe our architectures for reducing the latency of deep clustering by combining block processing and teacher-student learning. Unfolding of a phase reconstruction algorithm and a complex mask estimation method for speech separation are also described.

    Download PDF (680K)
PAPERS
  • Megumi Matsui
    2020 Volume 41 Issue 2 Pages 472-480
    Published: March 01, 2020
    Released on J-STAGE: March 01, 2020
    JOURNAL FREE ACCESS

    The purpose of the present study was to examine the effects of the pitch of short-duration pure tones on onomatopoeic expressions, as well as the common relationships between frequency and onomatopoeic expressions. The sound stimuli were 85 pure tones (duration: 240 ms) that spanned 7 octaves from 62.5 Hz to 8 kHz in 1/12-octave steps. The participants were randomly presented with each material twice and were told to write or choose what they heard using an onomatopoeic expression. The results indicated that the participants tend to use /u/ or /o/ for low frequencies and /i/ for high frequencies, and the distribution of vowels tends to be similar for speakers of different languages. Therefore, in vowels, there are common relationships between the pitch of pure tones and onomatopoeic expressions.

    Download PDF (1481K)
  • Katuhiro Maki, Maiko Ariyama
    2020 Volume 41 Issue 2 Pages 481-488
    Published: March 01, 2020
    Released on J-STAGE: March 01, 2020
    JOURNAL FREE ACCESS

    We measured the vibration characteristics of Japanese cedar (Cryptomeria japonica) from Nara Prefecture, which has a narrow, uniform, and straight wood grain that is a suitable appearance for violin tops. Then, we compared them with those of spruce (Picea spp.) to evaluate the potential utility of Nara cedar as an alternative to spruce for the sound boards of string instruments, including violin tops. The results of an evaluation with multiple indexes based on material density, specific dynamic modulus of elasticity, and loss tangent showed that Nara cedar has similar suitability to spruce species as sound boards. In addition, the results of an equivalence statistical test with index of vibration characteristics for Nara cedar and spruce also support this finding. As such, we were able to identify a possible native supply of wood with suitable characteristics for string instrument construction in Japan.

    Download PDF (494K)
  • Kimitaka Tsutsumi, Kenta Imaizumi, Yoichi Haneda, Hideaki Takada
    2020 Volume 41 Issue 2 Pages 489-500
    Published: March 01, 2020
    Released on J-STAGE: March 01, 2020
    JOURNAL FREE ACCESS

    We propose a method to create a directional sound source in front of a linear loudspeaker array. The method creates clusters of focused sources to form multipoles by using a linear loudspeaker array and superposes the multipoles to synthesize a directivity pattern. We also derive an efficient multipole structure in which adjacent lower order multipoles are overlapped. The structure reduces the number of focused sources, thereby reducing the algorithmic complexity needed to create them. To further reduce complexity, we also derive a time domain implementation of the proposed method. To mitigate degradation in the reproduced directivity due to superposition of the inaccurate sound fields of focused sources, a fractional delay interpolation is applied. Computer simulation results indicate that the proposed method based on superposition of up to the third order multipoles creates a directional sound source at significantly lower complexity than a conventional method.

    Download PDF (1921K)
  • Jihyeon Yun, Takayuki Arai
    2020 Volume 41 Issue 2 Pages 501-512
    Published: March 01, 2020
    Released on J-STAGE: March 01, 2020
    JOURNAL FREE ACCESS

    Previous research reported that Korean nasal consonants can be denasalized in word-initial position. This study examined the perception of word-initial nasal onset /n/ for native Korean listeners using synthesized /Ca/ stimuli with a Klatt synthesizer. We tested the effects of consonant duration, consonant nasality, and vowel nasalization on perception. In a rating experiment, listeners evaluated the goodness of the stimuli as /na/ on a seven-point scale. The participants generally gave favorable ratings to the stimuli with nasalized vowels. Two-thirds of the participants responded that the stimuli with no nasality are good exemplars of /na/, whereas the other listeners did not. In a yes-no experiment, participants judged if the stimuli were /na/ or not. They responded in similar ways they did in the rating experiment. Many listeners gave positive responses as /na/ even to the stimuli with 0 voice onset time, yet the stimuli with longer prevoicing or nasal murmur were more likely to be perceived as /na/. Vowel nasality affected the perception of /na/, while some listeners preferred oral vowels over the nasalized vowels when they evaluated the /na/-likeness.

    Download PDF (903K)
ACOUSTICAL LETTERS
feedback
Top