Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Volume 42, Issue 1
Displaying 1-9 of 9 articles from this issue
PAPERS
  • Yuki Saito, Taiki Nakamura, Yusuke Ijima, Kyosuke Nishida, Shinnosuke ...
    2021 Volume 42 Issue 1 Pages 1-11
    Published: January 01, 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    We propose non-parallel and many-to-many voice conversion (VC) using variational autoencoders (VAEs) that constructs VC models for converting arbitrary speakers' characteristics into those of other arbitrary speakers without parallel speech corpora for training the models. Although VAEs conditioned by one-hot coded speaker codes can achieve non-parallel VC, the phonetic contents of the converted speech tend to vanish, resulting in degraded speech quality. Another issue is that they cannot deal with unseen speakers not included in training corpora. To overcome these issues, we incorporate deep-neural-network-based automatic speech recognition (ASR) and automatic speaker verification (ASV) into the VAE-based VC. Since phonetic contents are given as phonetic posteriorgrams predicted from the ASR models, the proposed VC can overcome the quality degradation. Our VC utilizes d-vectors extracted from the ASV models as continuous speaker representations that can deal with unseen speakers. Experimental results demonstrate that our VC outperforms the conventional VAE-based VC in terms of mel-cepstral distortion and converted speech quality. We also investigate the effects of hyperparameters in our VC and reveal that 1) a large d-vector dimensionality that gives the better ASV performance does not necessarily improve converted speech quality, and 2) a large number of pre-stored speakers improves the quality.

    Download PDF (794K)
  • Ryo Teraoka, Shuichi Sakamoto, Zhenglie Cui, Yôiti Suzuki, Satoshi Shi ...
    2021 Volume 42 Issue 1 Pages 12-21
    Published: January 01, 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    Human listeners can readily extract sounds of interest from distracting sounds by directing their auditory spatial attention. Although the extent to which the auditory spatial attention influences listening performance and its spatial distribution in daily situations is important, the characteristics of this ability remain unclear. To investigate the characteristics of the auditory spatial attention, we measured the word intelligibility (4-mora words) and detection threshold of a target sound (1/12 octave-band noise burst) in the presence of distractor sounds (speech sounds/noises with the same bandwidth but with different center frequencies). In the experiment, we presented a target and multiple distractors simultaneously from loudspeakers surrounding the listeners. Results showed that word intelligibility improved when the target direction was attended compared to when it was not, whereas the detection threshold of the narrow-band noise was not influenced significantly by attention. These findings suggest that we can observe the effect of auditory selective attention when the listeners continuously direct their attention to a specific direction. Moreover, the spatial pattern of word intelligibility showed a peak corresponding to the attended direction. By contrast, the threshold of the narrow-band noise was constant regardless of the presented direction in which the target was presented.

    Download PDF (598K)
  • Takumi Asakura
    2021 Volume 42 Issue 1 Pages 22-35
    Published: January 01, 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    To develop a numerical method of determining the time and frequency characteristics of the external forces excited by various impact sources, the collision between the free-fall mass and the elastic platelike structure are modeled by the one-degree-of-freedom contact model structured on the discrete wave-based numerical analysis of bending vibration with the finite-difference time-domain (FDTD) method. Moreover, the calculated impact sounds were auralized and applied to the evaluation experiment of the subjective impression of loudness and annoyance. As the basic study, the excitation characteristics of generally used devices that is, the impact ball, bang machine, and tapping machine and human trotting are simulated and used for a numerical simulation targeting the prediction of the impact sound pressure levels inside a wall-type concrete structure. After that, the proposed method was applied to a subjective evaluation experiment of the loudness and annoyance influenced by a floor-impact sound on various types of floor slab. The results have shown that the loudness and annoyance induced by the floor-impact sound indicated a high correlation with the maximum A-weighted sound pressure level.

    Download PDF (1601K)
  • Yizhen Zhou, Yosuke Nakamura, Ryoko Mugitani, Junji Watanabe
    2021 Volume 42 Issue 1 Pages 36-45
    Published: January 01, 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    The goal of this study was to demonstrate the influence of prior auditory and visual information on speech perception, using a priming paradigm to investigate the shift in the perceptual boundary of geminate consonants. Although previous research has shown visual information such as photographs influences the perception of spoken words, the effects of auditory and visual (written or illustrated) information have not been directly compared. In the present study, native Japanese speakers judged whether or not a spoken word was a geminate word after hearing/seeing a prime word/pseudoword that contained either singleton or geminate feature. The results indicate the spoken words, written words and even illustrations presented prior to the target sounds, can guide boundary shift for Japanese geminate perception. Significantly, the influence of auditory information is independent of the lexical status of the primes, that is, both word and pseudoword auditory primes with geminate sound features induced a significant bias. On the other hand, visual primes induced the bias only when the primes coincided lexically with the targets, indicating the influence of visual information on geminate perception is different from auditory information.

    Download PDF (1056K)
TECHNICAL REPORTS
  • Junji Yoshida, Ibuki Hatta
    2021 Volume 42 Issue 1 Pages 46-49
    Published: January 01, 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    Robot vacuum cleaner, which cleans rooms automatically, is a useful appliance that helps house work. The radiated sound should be soft for comfortable living. When we think about hearing situations, we hear sound passively because a robot vacuum cleaner moves automatically without external control indication. On the other hand, when we use a conventional vacuum cleaner, the sound is perceived actively because we operate it ourselves. This hearing attitude difference may affect the perception of sound. In this study, we attempted to clarify the difference in the degree of uncomfortable under different hearing conditions by subjective evaluation tests. In the tests, we prepared the above-mentioned two hearing conditions for the evaluation of the radiated sound. The results showed that the participants felt more uncomfortable under passive hearing condition than under active hearing condition.

    Download PDF (608K)
  • Thanh Loc Bui, Thu Lan Nguyen, Makoto Morinaga, Takashi Morihara, Yasu ...
    2021 Volume 42 Issue 1 Pages 50-57
    Published: January 01, 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    Noise map provides a basis for land-use and flight path planning to limit the noise impact on residents around airports. This study is one of the first attempts to access an appropriate method to create noise maps for airports in Vietnam. In this study, the Lden around Noi Bai International Airport (NBIA) was predicted by using the Integrated Noise Model (INM) with available Noise-Power-Distance (NPD) data in INM and NPD data of military airplane created based on the field measurement. Besides, to assess the validity of the prediction, the predicted Lden was compared with the measured Lden, which were defined by field measurements conducted at ten residential sites around NBIA in November 2017. The noise levels were estimated with 3 cases: (1) Civil aircraft only, using INM's NPD; (2) Civil aircraft & military aircraft, using INM's NPD for military aircraft; (3) Civil aircraft & military aircraft, using measurement-based NPD for military aircraft. By comparing the root mean square error between the predicted and the measured values, it could be found that the prediction in Case 3 is the most consistent with the measured Lden. In other words, the prediction validity was improved by using measurement-based NPD of military aircraft.

    Download PDF (815K)
ACOUSTICAL LETTERS
feedback
Top