Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Advance online publication
Displaying 1-3 of 3 articles from this issue
  • Sei Ueno, Akinobu Lee
    Article ID: e23.70
    Published: 2024
    Advance online publication: February 29, 2024

    This paper presents simple multi-setting log Mel-scale filter bank (lmfb) training methods to fill the gap between real speech and synthesized speech in automatic speech recognition (ASR) data augmentation. While end-to-end ASR has been facing the lack of a sufficient amount of real speech data, its performance has been significantly improved by a data synthesis technique utilizing a TTS system. However, the generated speech from the TTS model is often monotonous and lacks the natural variations in real speech, negatively impacting ASR performance. We propose using multi-setting lmfb features for a data augmentation scheme to mitigate this problem. Multiple lmfb features are extracted with multiple STFT parameter settings that are obtained from well-known parameters for both ASR and TTS tasks. In addition, we also propose training a single TTS model using multi-setting lmfb features with its setting ID embedded in the text-to-Mel network. Experimental evaluations showed that both proposed multi-setting training methods achieved better ASR performance than the baseline single-setting training augmentation methods.

    Download PDF (1802K)
  • Kanta Nakamura, Naho Konoike, Takeshi Nishimura
    Article ID: e23.85
    Published: 2024
    Advance online publication: February 16, 2024

    The tongue plays a major role in speech production. Comparisons of the tongue muscle fiber architecture between humans and nonhuman primates are required to understand the evolutionary acquisition of tongue deformability in human speech. In this study, we performed diffusion-weighted imaging of flash-frozen tongue specimens from macaques, a representative animal model, to visualize the three-dimensional architecture of the intrinsic muscles. The procedures and scanning methods used in this study can also be applied to non-model animals, and are expected to provide quantified data for their tongue architecture to understand the evolutionarily derived features of human tongue deformability.

    Download PDF (2387K)
  • Yuki Saito, Kohei Yatabe, Shogun
    Article ID: e23.67
    Published: 2023
    Advance online publication: December 02, 2023

    Understanding of gameplay can enhance the experience and entertainment of video game. In this study, we propose to utilize the sound generated by a controller for analyzing the information of gameplay. Controller sound is a user-friendly feature related to gameplay because it can be very easily recorded. As a first step of the research, we performed identification of characters of Super Smash Bros. Ultimate only from controller sound as an example task for examining whether controller sound contains valuable information. The results showed that our model achieved 79% accuracy for identification of five characters only using the controller sound.

    Download PDF (1070K)