Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
最新号
選択された号の論文の10件中1~10を表示しています
PAPERS
  • Yoshiko Arimoto, Dan Oishi, Minato Okubo
    2025 年 46 巻 2 号 p. 125-135
    発行日: 2025/03/01
    公開日: 2025/03/01
    [早期公開] 公開日: 2024/11/06
    ジャーナル オープンアクセス

    To ensure the reliability of the evaluations obtained through crowdsourcing services, this study demonstrated methods of selecting qualified evaluators and reliable ratings, using emotional ratings for nonverbal vocalization obtained via crowdsourcing service. To evaluate the efficiency of the methods, emotional ratings were also obtained through a listening experiment in an in-person laboratory setting. Three filtering criteria were demonstrated, i.e., (a) excluding evaluators who rate more than 45% of assigned samples with a unique value, (b) excluding evaluators who take less than 7 seconds to rate each of assigned samples, and (c) excluding emotion rating instances which are associated with a low self-reported confidence rating. The results of the study showed that the crowdsourcing listening test exhibited similar tendencies to the in-person test, exhibiting high correlation coefficients of 0.873 for arousal, 0.739 for pleasantness, and 0.704 for dominance when the evaluators who took less than 7 seconds to evaluate the speech sample were eliminated. However, the differences in the correlation coefficients were only 0.001–0.007 between the filtered and the non-filtered scores. Moreover, the results revealed that the self-reported confidence scores can eliminate unreliable evaluation ratings, but the correlation improved only marginally.

  • Makoto Morinaga, Shigenori Yokoshima, Tomohiro Kobayashi, Sakae Yokoya ...
    2025 年 46 巻 2 号 p. 136-145
    発行日: 2025/03/01
    公開日: 2025/03/01
    [早期公開] 公開日: 2024/10/29
    ジャーナル オープンアクセス

    The oppressive or vibratory sensation caused by low-frequency sound is a widely known sensation inherent to that type of sound. In previous studies using one-third octave band noise as stimuli, the frequency region that causes the oppressive or vibratory sensation was felt before other sensations such as loudness and noisiness (here, called the peculiar region). However, it has been suggested that level fluctuations of one-third octave band noise affect the oppressive or vibratory sensation. Furthermore, few studies have investigated the threshold of these sensations. In the present study, we conducted laboratory experiments to investigate the peculiar region from 10 to 160 Hz as well as the sensation threshold by using low-frequency pure tones. The peculiar region in which the oppressive or vibratory sensation became dominant was generally consistent with the findings of previous studies. However, differences were found in relatively higher frequencies such as 80 and 160 Hz. In addition, the median threshold value was lower than the lowest level of the peculiar region. The threshold differed greatly among the participants, and the higher the frequency, the more pronounced the difference. Multiple regression analysis suggested that these individual differences might be related to noise sensitivity.

  • Hien Ohnaka, Ryoichi Miyazaki
    2025 年 46 巻 2 号 p. 146-156
    発行日: 2025/03/01
    公開日: 2025/03/01
    [早期公開] 公開日: 2024/11/18
    ジャーナル オープンアクセス

    This paper proposes an unsupervised DNN-based speech enhancement approach founded on deep priors (DPs). Here, DP signifies that DNNs are more inclined to produce clean speech signals than noises. Conventional methods based on DP typically involve training on a noisy speech signal using a random noise feature as input, stopping training only a clean speech signal is generated. However, such conventional approaches encounter challenges in determining the optimal stop timing, experience performance degradation due to environmental background noise, and suffer a trade-off between distortion of the clean speech signal and noise reduction performance. To address these challenges, we utilize two DNNs: one to generate a clean speech signal and the other to generate noise. The combined output of these networks closely approximates the noisy speech signal, with a loss term based on spectral kurtosis utilized to separate the noisy speech signal into a clean speech signal and noise. The key advantage of this method lies in its ability to circumvent trade-offs and early stopping problems, as the signal is decomposed by enough steps. Through evaluation experiments, we demonstrate that the proposed method outperforms conventional methods in the case of white Gaussian and environmental noise while effectively mitigating early stopping problems.

  • Takayuki Hidaka, Akira Omoto, Noriko Nishihara
    2025 年 46 巻 2 号 p. 157-166
    発行日: 2025/03/01
    公開日: 2025/03/01
    [早期公開] 公開日: 2024/12/12
    ジャーナル オープンアクセス

    This paper studies whether there is a difference in subjective judgments between musical experts and non-experts regarding the preferred reverberation time and clarity of concert halls based on a psychoacoustic test. The test signals were piano and violin solos convoluted with binaural room impulse responses measured at 34 positions in 18 symphonic halls. Experts consisted of outstanding musicians, music managers, recording engineers, and acousticians. They all had listening experience in many halls listed here. Non-experts were students who had more extended musical training than ordinary ones. The preferred reverberation time at mid-frequencies (average of 500 Hz and 1,000 Hz) obtained for piano and violin were 1.2 to 2.0 s and 1.8 to 2.4 s for the experts, and 0.9 to 2.1 s and 1.6 to 2.7 s for the non-experts. The latter resulted in a 50% and 83% broader range of judgments for piano and violin, respectively. Clarity showed a similar tendency. This result indicates that the subjective judgment by musical experts is more reliable than non-experts when designing actual concert halls.

TECHNICAL REPORT
  • Tatsuya Kitamura, Jin Oyama, Jing Sun, Ryoko Hayashi
    2025 年 46 巻 2 号 p. 167-172
    発行日: 2025/03/01
    公開日: 2025/03/01
    [早期公開] 公開日: 2024/11/01
    ジャーナル オープンアクセス

    This study aimed to develop an indicator for assessing articulatory motion during fast and repetitive syllable production, focusing on fluency, periodicity, and consistency. The method utilizes a kymograph derived from the ultrasound imaging of tongue movements in the midsagittal plane. The kymograph was generated by juxtaposing pixels along an observation line through the point of the greatest tongue movement. Periodic patterns in the kymograph indicate controlled, consistent tongue and mandibular movements, whereas nonperiodic patterns suggest speech disturbances. The method employs power spectral image, obtained through two-dimensional discrete Fourier transforms of the kymograph. The resulting power spectrum represents the periodic components in the horizontal direction of the kymograph, with prominent peaks indicating consistent patterns. To validate the method, the authors analyzed ultrasound movies of healthy Japanese speakers—both fluent speakers and those who experienced a sense of speech clumsiness—producing repetitive syllables (/aka/, /aga/, /ata/, and /ada/). The results demonstrated the effectiveness of the indicator in distinguishing between periodic and nonperiodic tongue motions. This approach shows promise for application to real-time MRI movies, potentially opening new avenues for the in-depth analysis of motor speech function. This indicator contributes to the assessment and quantification of the articulatory motion.

ACOUSTICAL LETTERS
  • Yoshiki Nagatani, Masajiro Chikamori, Eriko Aiba
    2025 年 46 巻 2 号 p. 173-176
    発行日: 2025/03/01
    公開日: 2025/03/01
    [早期公開] 公開日: 2024/11/02
    ジャーナル オープンアクセス

    Extensive research has explored the treatment or suppression of dementia through the presentation of sensory stimuli, such as pulse sounds with a repetition frequency of 40 Hz or amplitude-modulated sounds at 40 Hz, which evoke or entrain gamma waves. Empirical evidence indicates that, even when the equivalent noise level is adjusted, there is significant deviation in perceived loudness. This study measured the loudness of pulsed and modulated sounds using the constant method. The results showed that the modulated wave required no adjustment, while the sinusoidal and rectangular pulses required adjustments of −13 dB and −21 dB in addition to the equalization of their equivalent noise levels, respectively.

  • Naoki Shinobu, Toma Yoshimatsu, Hiroaki Itou, Shihori Kozuka, Noriyosh ...
    2025 年 46 巻 2 号 p. 177-181
    発行日: 2025/03/01
    公開日: 2025/03/01
    [早期公開] 公開日: 2024/10/22
    ジャーナル オープンアクセス

    Sound pressure interpolation is important for applications such as Active Noise Control (ANC) with virtual sensing. However, many interpolation methods, developed in the frequency domains in free fields, cannot be directly used for real-time processing with consideration of head reflections, such as ANC headrests. To overcome this problem, we propose a low-delay interpolation method using IIR filters for each order of spherical harmonic expansion. These IIR filter coefficients are derived from rigid-sphere transfer functions in the z-plane. A computer simulation showed that the proposed method can predict the sound pressure on a rigid sphere using an open sphere microphone array.

  • Leo Misono, Kenji Muto
    2025 年 46 巻 2 号 p. 182-185
    発行日: 2025/03/01
    公開日: 2025/03/01
    [早期公開] 公開日: 2024/11/23
    ジャーナル オープンアクセス

    In Toyosu, Tokyo, the sounds of the large brown cicada and the robust cicada are heard during summer. This study aimed to accurately measure the frequency characteristics of the large brown cicada. Previously, we analyzed the robust cicada climax sound in a free field up to 20 kHz. In this paper, we extended the analysis to the large brown cicada in a free field up to 70 kHz using a microphone with flat sensitivity up to high frequencies. The dominant frequencies for the /ji/ and /ri/ sounds were measured at 5.7 kHz and 15 kHz, respectively.

  • Binh Thien Nguyen, Yukoh Wakabayashi, Yuting Geng, Kenta Iwai, Takanob ...
    2025 年 46 巻 2 号 p. 186-190
    発行日: 2025/03/01
    公開日: 2025/03/01
    [早期公開] 公開日: 2024/11/02
    ジャーナル オープンアクセス

    This paper presents a DNN-based phase reconstruction algorithm for online speech enhancement. Although various online phase reconstruction algorithms have been proposed, many of them rely on the structure of the clean amplitude. This restricts their performance in speech enhancement applications, where only noisy observations are available. In contrast, our proposed method directly estimates the clean phase from the noisy observation. Several aspects of phase reconstruction and their effects on speech enhancement are also investigated and discussed. Experimental results confirm that our method performs better than conventional online phase reconstruction methods for speech enhancement in all experimental settings.

feedback
Top