Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
46 巻, 1 号
—Special Issue on Speech Diversity and Its Applications—
選択された号の論文の20件中1~20を表示しています
PAPERS
  • Shota Okubo, Toshiharu Horiuchi
    2025 年 46 巻 1 号 p. 1-10
    発行日: 2025/01/01
    公開日: 2025/01/01
    [早期公開] 公開日: 2024/09/21
    ジャーナル オープンアクセス

    The finite difference time domain (FDTD) method has been proposed and used for sound field simulation. To reproduce actual sound wave propagation in sound field simulations, it is necessary to apply the radiation characteristics. With the FDTD method, radiation characteristics can be applied by setting sound pressure in a dense grid arrangement. However, conventional techniques for capturing radiation characteristics use a sparse array of microphones and are considered insufficient for the FDTD simulation. Furthermore, the technique required to apply captured acoustic signals in a dense grid arrangement with the FDTD method has not been considered. In this paper, we propose a novel hardware and software system that captures the radiation characteristics for a dense grid arrangement and applies them to the FDTD method, while controlling the sound wave propagation with the non-propagation region. The proposed system produces the average differences from measured values of sound pressure, propagation time, center frequency, and log-spectral distortion of 1.8 dB, 0.04 ms, 700 Hz, and 3.5 dB, respectively, which is more accurate than the conventional techniques. The result shows that this system is useful for improving the accuracy of sound wave propagation reproduction with the sound field simulation.

  • Tong Zhou, Kazuya Yasueda, Ghada Bouattour, Anthimos Georgiadis, Akito ...
    2025 年 46 巻 1 号 p. 11-21
    発行日: 2025/01/01
    公開日: 2025/01/01
    [早期公開] 公開日: 2024/09/27
    ジャーナル オープンアクセス

    This study introduces bidirectional stepwise-based algorithms designed to optimize loudspeaker array configurations for Multizone Sound Field Reproduction systems. An initial arrangement selection method based on loudspeaker magnitude enhances the optimization process. These algorithms were validated using the Acoustic Contrast Control and Pressure Matching methods across free-field conditions and a comprehensive Room Impulse Response database including various room conditions. Comparative experiments against traditional unidirectional iterative strategies demonstrate that the proposed algorithms significantly outperform existing methods in terms of efficiency and effectiveness, especially in configurations with fewer loudspeakers. For example, in a small meeting room with 16 loudspeakers, the stepwise-based approaches achieved higher acoustic contrast and required substantially fewer iterations than conventional methods. Specifically, optimization efficiency improvements were about 55.2% and 77.8% in Acoustic Contrast Control and 36.7% and 68.6% in Pressure Matching, compared to conventional iteratively adding or removing approaches.

  • Hikaru Miura
    2025 年 46 巻 1 号 p. 22-29
    発行日: 2025/01/01
    公開日: 2025/01/01
    [早期公開] 公開日: 2024/10/05
    ジャーナル オープンアクセス

    This paper describes the development of a compact ultrasonic vibration source that has a transverse vibrating plate that can achieve large displacement amplitudes. An ultrasonic vibration source was designed, in which the ultrasonic vibrator excluding the transducer was approximately the same length as the transducer (half the wavelength of the longitudinal vibration). Therefore, the ultrasonic vibrator was integrated with the transverse vibrating plate and the amplitude expansion horn. The design method for integrating the ultrasonic vibration source was clarified, and the vibration characteristics of the vibration source were investigated. The ultrasonic source was used to atomize droplets, demonstrating its practical utility.

ACOUSTICAL LETTERS
—Special Issue on Speech Diversity and Its Applications—
FOREWORD
INVITED PAPERS
  • Kikuo Maekawa
    2025 年 46 巻 1 号 p. 45-54
    発行日: 2025/01/01
    公開日: 2025/01/01
    [早期公開] 公開日: 2024/08/09
    ジャーナル オープンアクセス

    Real-time MRI video imaging has had a significant impact on articulatory phonetics. Many new findings have been obtained using this technology that enables the objective observation of the whole vocal tract under speech production, which has long been imagined by subjective retrospection. In this paper, I introduce the specifications of the "Real-time MRI Articulatory Movement Database (rtMRIDB)" that my colleagues and I developed and its relevance to the study of diversity in Japanese phonetics. Some ongoing technological developments are also introduced.

  • Yongwei Li, Aijun Li, Jianhua Tao, Feng Li, Donna Erickson, Masato Aka ...
    2025 年 46 巻 1 号 p. 55-63
    発行日: 2025/01/01
    公開日: 2025/01/01
    [早期公開] 公開日: 2024/08/24
    ジャーナル オープンアクセス

    Emotions are usually perceived by multimodal cues for human communications; in recent years, emotions have been studied from the perspective of dimensional approaches. Investigation of audio and video cues to emotion perception in terms of categories of emotion has been relatively extensively conducted, but the contribution of audio and video cues to emotion perception in dimensional space is relatively under-investigated, especially in Mandarin Chinese. In this present study, three psychoacoustic experiments were conducted to investigate the contributions of audio, visual, and audio-visual modalities to emotional perception in the valence and arousal space. Audio-only, video-only, and audio-video modalities were presented to native Chinese subjects with normal hearing and vision for perceptual ratings of emotion in the valence and arousal dimensions. Results suggested that (1) different modalities contribute differently to perceiving valence and arousal dimensions; (2) compared to video-only modality, audio-only modality generally decreases arousal and valence at lower levels, and increases arousal and valence at higher levels; (3) the video-only modality plays an important role in separating anger and happiness emotions in the valence space.

INVITED REVIEW
  • Koichi Mori
    2025 年 46 巻 1 号 p. 64-69
    発行日: 2025/01/01
    公開日: 2025/01/01
    [早期公開] 公開日: 2024/06/08
    ジャーナル オープンアクセス

    The aim of this review is to introduce the concept of neurodiversity as used for developmental stuttering. Since the introduction of the ICF by WHO in 2001, the social model has been introduced into clinical practice. However, it primarily asks the community to be responsible for the accommodation of persons with disabilities (PDs). In addition to the necessity of changes in the legal and legislative environments to conform to the Convention on the Rights of Persons with Disabilities of the United Nations (2006), effective education and advocacy are needed for society to acknowledge and reduce biases of ableism and stigma of disabilities. Ableism is the claim that society is for able-bodied and able-minded people. Ableism remarks and behaviors may impact PDs adversely and are called microaggressions. The diversity movement tries to embrace PDs by removing the border between the able and the disabled. The etiology and characteristics of developmental stuttering are depicted, as well as its neurodiverse and complex nature. The recent advances in the treatment of stuttering without ableism are introduced. Education and advocacy of (neuro)diversity and inclusion in society are still sorely needed for medical and welfare professionals as well as for the general public.

PAPER
  • Hiroki Mori, Hironao Nishino
    2025 年 46 巻 1 号 p. 70-77
    発行日: 2025/01/01
    公開日: 2025/01/01
    [早期公開] 公開日: 2024/08/07
    ジャーナル オープンアクセス

    We propose an end-to-end conversational speech synthesis system that allows for flexible control of emotional states defined over emotion dimensions. We extend the Tacotron 2 and VITS architectures to accept emotion dimensions as input. Initially, the model is pre-trained using a large-scale spontaneous speech corpus, followed by fine-tuning using a natural dialogue speech corpus with manually annotated perceived emotion in the form of pleasantness and arousal. Since the pre-training lacks emotion information, we explore two pre-training strategies and demonstrate that applying an emotion dimension estimator before the pre-training enhances emotion controllability. Evaluation of the synthesized speech using VITS yields a mean opinion score of 4 or higher for naturalness. Furthermore, there is a correlation of R=0.53 for pleasantness and R=0.89 for arousal between the given and perceived emotional states. These results underscore the effectiveness of our proposed conversational speech synthesis system with emotion control.

TECHNICAL REPORTS
  • Yoshiko Arimoto, Yasuo Horiuchi, Sumio Ohno
    2025 年 46 巻 1 号 p. 78-86
    発行日: 2025/01/01
    公開日: 2025/01/01
    [早期公開] 公開日: 2024/05/11
    ジャーナル オープンアクセス

    A reliable method of determining the base frequency (Fb) for utterances of various speaking styles is critical to enabling stable command labeling in the Fujisaki model. To achieve stable command labeling for diverse expressions of speech, a linear fitted model was developed using the ten percentile F0 of each utterance from three corpora of various speaking styles (read, acted, and spontaneous) as the independent variable to estimate a consistent Fb for each utterance. To assess the robustness of the model for unknown utterances, the model was applied to test data, including both open and corpus-open data not used for the model development, and the difference between the estimated Fb and the trained labelers' annotated Fb was calculated. As a result, the obtained estimation model was found to fit well to the manually labeled Fbs by exhibiting a small root mean squared error (RMSE) of 0.096 and a high coefficient of determination (R2) of 0.89 for the closed dataset. Moreover, the model also exhibited a small RMSE of 0.091 and a high R2 of 0.92 for the corpus-open dataset. The results revealed that the proposed model can reliably estimate the Fb of utterances with various speaking styles.

  • Mizuki Nagano, Yusuke Ijima, Sadao Hiroya
    2025 年 46 巻 1 号 p. 87-95
    発行日: 2025/01/01
    公開日: 2025/01/01
    [早期公開] 公開日: 2024/08/01
    ジャーナル オープンアクセス

    The retail industries strive to enhance the willingness to buy through various elements, such as store environment, layout, and advertising. Speech is one of the most effective methods used in advertising, particularly in broadcast advertising. Our previous study indicated that the stimulus-organism-response (SOR) theory, using emotional states, can partially explain the effect of advertising speech on the willingness to buy. It suggests that emotional states alone are not sufficient to explain this effect. In this study, we conducted an experiment to determine whether adding semantic primitives to the emotion-mediated SOR model could completely mediate the impact of advertising speech on the willingness to buy. During the study, participants listened to speech with modified features (mean fundamental frequency (F0), speech rate, or standard deviation of F0) and rated their willingness to buy the advertised products, as well as their own emotions and semantic primitives. We found that adding semantic primitives as a mediator can completely mediate the willingness to buy from the standard deviation of F0 in the advertising speech. These results will be useful for developing speech synthesis methods aimed at increasing people's willingness to buy.

ACOUSTICAL LETTERS
feedback
Top