Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Current issue
Displaying 1-10 of 10 articles from this issue
PAPERS
  • Yoshiko Arimoto, Dan Oishi, Minato Okubo
    2025 Volume 46 Issue 2 Pages 125-135
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: November 06, 2024
    JOURNAL OPEN ACCESS

    To ensure the reliability of the evaluations obtained through crowdsourcing services, this study demonstrated methods of selecting qualified evaluators and reliable ratings, using emotional ratings for nonverbal vocalization obtained via crowdsourcing service. To evaluate the efficiency of the methods, emotional ratings were also obtained through a listening experiment in an in-person laboratory setting. Three filtering criteria were demonstrated, i.e., (a) excluding evaluators who rate more than 45% of assigned samples with a unique value, (b) excluding evaluators who take less than 7 seconds to rate each of assigned samples, and (c) excluding emotion rating instances which are associated with a low self-reported confidence rating. The results of the study showed that the crowdsourcing listening test exhibited similar tendencies to the in-person test, exhibiting high correlation coefficients of 0.873 for arousal, 0.739 for pleasantness, and 0.704 for dominance when the evaluators who took less than 7 seconds to evaluate the speech sample were eliminated. However, the differences in the correlation coefficients were only 0.001–0.007 between the filtered and the non-filtered scores. Moreover, the results revealed that the self-reported confidence scores can eliminate unreliable evaluation ratings, but the correlation improved only marginally.

    Download PDF (674K)
  • Makoto Morinaga, Shigenori Yokoshima, Tomohiro Kobayashi, Sakae Yokoya ...
    2025 Volume 46 Issue 2 Pages 136-145
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: October 29, 2024
    JOURNAL OPEN ACCESS

    The oppressive or vibratory sensation caused by low-frequency sound is a widely known sensation inherent to that type of sound. In previous studies using one-third octave band noise as stimuli, the frequency region that causes the oppressive or vibratory sensation was felt before other sensations such as loudness and noisiness (here, called the peculiar region). However, it has been suggested that level fluctuations of one-third octave band noise affect the oppressive or vibratory sensation. Furthermore, few studies have investigated the threshold of these sensations. In the present study, we conducted laboratory experiments to investigate the peculiar region from 10 to 160 Hz as well as the sensation threshold by using low-frequency pure tones. The peculiar region in which the oppressive or vibratory sensation became dominant was generally consistent with the findings of previous studies. However, differences were found in relatively higher frequencies such as 80 and 160 Hz. In addition, the median threshold value was lower than the lowest level of the peculiar region. The threshold differed greatly among the participants, and the higher the frequency, the more pronounced the difference. Multiple regression analysis suggested that these individual differences might be related to noise sensitivity.

    Download PDF (950K)
  • Hien Ohnaka, Ryoichi Miyazaki
    2025 Volume 46 Issue 2 Pages 146-156
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: November 18, 2024
    JOURNAL OPEN ACCESS

    This paper proposes an unsupervised DNN-based speech enhancement approach founded on deep priors (DPs). Here, DP signifies that DNNs are more inclined to produce clean speech signals than noises. Conventional methods based on DP typically involve training on a noisy speech signal using a random noise feature as input, stopping training only a clean speech signal is generated. However, such conventional approaches encounter challenges in determining the optimal stop timing, experience performance degradation due to environmental background noise, and suffer a trade-off between distortion of the clean speech signal and noise reduction performance. To address these challenges, we utilize two DNNs: one to generate a clean speech signal and the other to generate noise. The combined output of these networks closely approximates the noisy speech signal, with a loss term based on spectral kurtosis utilized to separate the noisy speech signal into a clean speech signal and noise. The key advantage of this method lies in its ability to circumvent trade-offs and early stopping problems, as the signal is decomposed by enough steps. Through evaluation experiments, we demonstrate that the proposed method outperforms conventional methods in the case of white Gaussian and environmental noise while effectively mitigating early stopping problems.

    Download PDF (1555K)
  • Takayuki Hidaka, Akira Omoto, Noriko Nishihara
    2025 Volume 46 Issue 2 Pages 157-166
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: December 12, 2024
    JOURNAL OPEN ACCESS

    This paper studies whether there is a difference in subjective judgments between musical experts and non-experts regarding the preferred reverberation time and clarity of concert halls based on a psychoacoustic test. The test signals were piano and violin solos convoluted with binaural room impulse responses measured at 34 positions in 18 symphonic halls. Experts consisted of outstanding musicians, music managers, recording engineers, and acousticians. They all had listening experience in many halls listed here. Non-experts were students who had more extended musical training than ordinary ones. The preferred reverberation time at mid-frequencies (average of 500 Hz and 1,000 Hz) obtained for piano and violin were 1.2 to 2.0 s and 1.8 to 2.4 s for the experts, and 0.9 to 2.1 s and 1.6 to 2.7 s for the non-experts. The latter resulted in a 50% and 83% broader range of judgments for piano and violin, respectively. Clarity showed a similar tendency. This result indicates that the subjective judgment by musical experts is more reliable than non-experts when designing actual concert halls.

    Download PDF (725K)
TECHNICAL REPORT
  • Tatsuya Kitamura, Jin Oyama, Jing Sun, Ryoko Hayashi
    2025 Volume 46 Issue 2 Pages 167-172
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: November 01, 2024
    JOURNAL OPEN ACCESS

    This study aimed to develop an indicator for assessing articulatory motion during fast and repetitive syllable production, focusing on fluency, periodicity, and consistency. The method utilizes a kymograph derived from the ultrasound imaging of tongue movements in the midsagittal plane. The kymograph was generated by juxtaposing pixels along an observation line through the point of the greatest tongue movement. Periodic patterns in the kymograph indicate controlled, consistent tongue and mandibular movements, whereas nonperiodic patterns suggest speech disturbances. The method employs power spectral image, obtained through two-dimensional discrete Fourier transforms of the kymograph. The resulting power spectrum represents the periodic components in the horizontal direction of the kymograph, with prominent peaks indicating consistent patterns. To validate the method, the authors analyzed ultrasound movies of healthy Japanese speakers—both fluent speakers and those who experienced a sense of speech clumsiness—producing repetitive syllables (/aka/, /aga/, /ata/, and /ada/). The results demonstrated the effectiveness of the indicator in distinguishing between periodic and nonperiodic tongue motions. This approach shows promise for application to real-time MRI movies, potentially opening new avenues for the in-depth analysis of motor speech function. This indicator contributes to the assessment and quantification of the articulatory motion.

    Download PDF (609K)
ACOUSTICAL LETTERS
  • Yoshiki Nagatani, Masajiro Chikamori, Eriko Aiba
    2025 Volume 46 Issue 2 Pages 173-176
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: November 02, 2024
    JOURNAL OPEN ACCESS

    Extensive research has explored the treatment or suppression of dementia through the presentation of sensory stimuli, such as pulse sounds with a repetition frequency of 40 Hz or amplitude-modulated sounds at 40 Hz, which evoke or entrain gamma waves. Empirical evidence indicates that, even when the equivalent noise level is adjusted, there is significant deviation in perceived loudness. This study measured the loudness of pulsed and modulated sounds using the constant method. The results showed that the modulated wave required no adjustment, while the sinusoidal and rectangular pulses required adjustments of −13 dB and −21 dB in addition to the equalization of their equivalent noise levels, respectively.

    Download PDF (675K)
  • Naoki Shinobu, Toma Yoshimatsu, Hiroaki Itou, Shihori Kozuka, Noriyosh ...
    2025 Volume 46 Issue 2 Pages 177-181
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: October 22, 2024
    JOURNAL OPEN ACCESS

    Sound pressure interpolation is important for applications such as Active Noise Control (ANC) with virtual sensing. However, many interpolation methods, developed in the frequency domains in free fields, cannot be directly used for real-time processing with consideration of head reflections, such as ANC headrests. To overcome this problem, we propose a low-delay interpolation method using IIR filters for each order of spherical harmonic expansion. These IIR filter coefficients are derived from rigid-sphere transfer functions in the z-plane. A computer simulation showed that the proposed method can predict the sound pressure on a rigid sphere using an open sphere microphone array.

    Download PDF (652K)
  • Leo Misono, Kenji Muto
    2025 Volume 46 Issue 2 Pages 182-185
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: November 23, 2024
    JOURNAL OPEN ACCESS

    In Toyosu, Tokyo, the sounds of the large brown cicada and the robust cicada are heard during summer. This study aimed to accurately measure the frequency characteristics of the large brown cicada. Previously, we analyzed the robust cicada climax sound in a free field up to 20 kHz. In this paper, we extended the analysis to the large brown cicada in a free field up to 70 kHz using a microphone with flat sensitivity up to high frequencies. The dominant frequencies for the /ji/ and /ri/ sounds were measured at 5.7 kHz and 15 kHz, respectively.

    Download PDF (915K)
  • Binh Thien Nguyen, Yukoh Wakabayashi, Yuting Geng, Kenta Iwai, Takanob ...
    2025 Volume 46 Issue 2 Pages 186-190
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: November 02, 2024
    JOURNAL OPEN ACCESS

    This paper presents a DNN-based phase reconstruction algorithm for online speech enhancement. Although various online phase reconstruction algorithms have been proposed, many of them rely on the structure of the clean amplitude. This restricts their performance in speech enhancement applications, where only noisy observations are available. In contrast, our proposed method directly estimates the clean phase from the noisy observation. Several aspects of phase reconstruction and their effects on speech enhancement are also investigated and discussed. Experimental results confirm that our method performs better than conventional online phase reconstruction methods for speech enhancement in all experimental settings.

    Download PDF (762K)
feedback
Top