Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
早期公開論文
早期公開論文の22件中1~22を表示しています
  • Naofumi Aoki, Kenichi Ikeda, Shio Tamba, Tatsuo Sugama, Motoya Harada, ...
    論文ID: e26.08
    発行日: 2026年
    [早期公開] 公開日: 2026/04/11
    ジャーナル オープンアクセス 早期公開

    Acoustic communication is a practical approach for short-range wireless data exchange. In consumer products, inexpensive microprocessors often cause frequency detuning due to inaccurate clocks, leading to increased communication errors. To address this problem, we investigate improving clock-lag estimation accuracy through appropriate selection of carrier-frequency sets. We formulate an optimization-based carrier-frequency design that disperses adjacent frequency ratios. Simulation results suggest that the proposed method can reduce the ambiguity of clock-lag estimation, particularly in scenarios where certain carrier frequencies, such as edge frequencies, are omitted.

  • Mariko Tsuruta-Hamamura, Hiroshi Hasegawa, Shin-ichiro Iwamiya
    論文ID: e25.88
    発行日: 2026年
    [早期公開] 公開日: 2026/04/08
    ジャーナル オープンアクセス 早期公開

    Previous studies have reported gender differences in perceived loudness. For instance, women tend to assign higher loudness scores to sounds with the same sound pressure level than men when verbal expressions such as “soft” and “loud” are used to evaluate perceived loudness. However, when a ratio scale was used, gender differences were observed under limited conditions in Chinese participants but not Japanese participants. In this study, to clarify the factors affecting gender differences in loudness perception, we conducted four experiments involving magnitude estimation and magnitude production methods in Japanese participants. We examined gender difference in perceived loudness with respect to changes in sound pressure level. The power exponent α in Stevens’ power law, estimated from the experimental results of the four experiments, did not show a clear gender-based difference. According to our results, gender differences in judgment criteria using verbal expression such as “soft” and “loud” might be a principal factor causing gender differences in the evaluation of perceived loudness, at least among Japanese participants.

  • Yutao Zhang, Shiori Totsuka, Yuting Geng, Masato Nakayama, Ryo Akama, ...
    論文ID: e25.110
    発行日: 2026年
    [早期公開] 公開日: 2026/04/07
    ジャーナル オープンアクセス 早期公開

    This letter proposes a new Kuzushiji transcription framework that integrates optical character recognition (OCR) with read-speech automatic speech recognition (ASR) via hiragana-level fusion, without requiring additional model training. The framework uses the transcriber’s read-speech as an additional modality to guide beam-search OCR hypothesis selection for Kuzushiji transcription. Each OCR candidate is scored based on its phonetic similarity to the ASR output of the corresponding Kuzushiji read-speech at the hiragana-sequence level. Evaluation results show the effectiveness of the proposed framework in reducing the character error rate in contrast to conventional OCR-only Kuzushiji transcription.

  • Yuto Otani, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada
    論文ID: e25.75
    発行日: 2026年
    [早期公開] 公開日: 2026/04/07
    ジャーナル オープンアクセス 早期公開

    This paper presents a novel approach for speech synthesis using articulatory movements captured by real-time magnetic resonance imaging (rtMRI), focusing on fundamental frequency (F0) estimation mechanisms. Although recent rtMRI-based methods have achieved promising results, it remains unclear how F0 information is reproduced, given rtMRI's limited ability to capture vocal fold vibrations. To address this gap, we propose a speech synthesis method that processes only four consecutive rtMRI frames (~150 ms)—preventing reliance on extended linguistic context to infer F0. Our method employs an EfficientNetV2-BiLSTM network that enables sophisticated F0-related feature extraction for mel-spectrogram estimation, followed by a HiFi-GAN vocoder for high-fidelity waveform generation. Evaluations on the ATR 503 sentences rtMRI database demonstrate intelligible speech synthesis with accurate F0 reproduction. Building on these results, we further estimate F0 from single MRI frames, confirming that F0 can be derived without temporal context. To explore the underlying basis, we apply optical flow analysis to visualize subtle articulatory differences associated with F0 control, primarily revealing upward/forward larynx and tongue shifts with increasing F0. Additionally, distinct patterns were observed in male speakers at low F0 ranges. These findings empirically validate the relationship between articulatory configurations and F0 control, demonstrating feasibility in rtMRI-based speech synthesis.

  • Shigeaki Amano, Kimiko Yamakawa, Arkadiusz Rojczyk
    論文ID: e26.17
    発行日: 2026年
    [早期公開] 公開日: 2026/04/02
    ジャーナル オープンアクセス 早期公開

    Previous studies on geminate and singleton consonants have employed the ratio of geminate duration to singleton duration (the GS ratio) as an invariant parameter to nullify the effects of speaking rate variation. However, the validity of the implicit assumption that the GS ratio effectively compensates for duration variations induced by speaking rate has not yet been empirically tested. This study formalized this GS ratio assumption mathematically in two scenarios: linear and logarithmic scales of duration. Furthermore, it examined the validity of this assumption using parameters derived from previous research. Our analysis identified the specific mathematical conditions that must be satisfied for the implicit assumption to hold. The empirical test revealed that these conditions were not met, indicating that the GS ratio varies across different speaking rates. These results suggest that the implicit assumption of the GS ratio is not supported in either linear or logarithmic scales. This contradicts the assumptions in previous studies and indicates that careful verification is necessary when employing the GS ratio.

  • Masahiro Toyoda
    論文ID: e26.18
    発行日: 2026年
    [早期公開] 公開日: 2026/04/01
    ジャーナル オープンアクセス 早期公開

    The receiving points used for measuring floor impact sound levels are specified as follows: “Within the receiving room, distribute four or more measurement points evenly, each separated by at least 70 cm, with spaces at least 50 cm away from the ceiling, surrounding walls, and floor surface.” The energy-averaged sound level over these points is used for evaluation. The present letter verifies whether this averaged value sufficiently represents the floor impact sound levels throughout the receiving room. Using the finite-difference time-domain method, a two-story concrete building was analyzed. The floor impact sound levels were compared between a case where multiple receiving points were installed and a case where only the receiving points specified by the standard were installed. The comparison confirmed that while differences exceeding 5 dB were observed in several frequency bands, the specified points sufficiently represent the floor impact sound level with adequate accuracy across a broad frequency range, including the decisive frequency.

  • Kento Hara, Tsuguto Hoshino, Motoki Yairi, Takashi Takeuchi, Philip A. ...
    論文ID: e26.13
    発行日: 2026年
    [早期公開] 公開日: 2026/03/31
    ジャーナル オープンアクセス 早期公開

    The theory of the Optimal Source Distribution (OSD) has been proposed as the basis for binaural synthesis over loudspeakers. In applying this theory to practical systems, discrete linear loudspeaker arrangements and frequency-band division filtering cause an increase in the condition number of the transfer function matrix. This paper proposes a transfer function matrix reconstruction method by applying gain and delay parameters that are optimized using numerical optimization to reduce the condition number. The effectiveness of the proposed method is experimentally validated using a multiway loudspeaker system based on the OSD principle.

  • Yuta Goshima, Yoichi Haneda
    論文ID: e26.14
    発行日: 2026年
    [早期公開] 公開日: 2026/03/28
    ジャーナル オープンアクセス 早期公開

    Synthesizing virtual sound sources that traverse a linear loudspeaker array poses a challenge for wave field synthesis (WFS) due to singularities. To address this issue, we propose a time-domain representation of WFS based on the spatial-shifting filter, derived via analytical inverse temporal and spatial Fourier transforms. By applying the stationary phase approximation, the proposed method derives the driving function suitable for sample-by-sample processing, thereby eliminating frame-length latency. The validity of the proposed method is demonstrated through numerical simulations.

  • Kazunori Harada, Yasuhiro Hiraguri, Takuya Oshima, Yoshinori Saito, Sa ...
    論文ID: e25.79
    発行日: 2026年
    [早期公開] 公開日: 2026/03/20
    ジャーナル オープンアクセス 早期公開

    This study assesses the health impacts of road traffic noise in Osaka City (Tennoji Ward) and Higashiosaka City, Osaka Prefecture, using strategic noise maps. The ASJ RTN-Model 2018 was employed to estimate noise levels, and three noise reduction scenarios were analyzed: reducing light vehicle power levels, heavy vehicle level and installing porous pavement. The study also combined estimations of building occupancy from open data and noise exposure level, enabling assessments of noise health impact. Results of three scenarios and reference showed that approximately 10% and 5% of the population in Tennoji Ward and Higashiosaka City, respectively, are estimated to be suffering from high annoyance due to traffic noise. While noise reduction measures effectively decreased exposure levels, their impact on ischaemic heart disease incidence was limited, suggesting the need for more comprehensive mitigation strategies. The effectiveness of different scenarios varied between cities, highlighting the importance of considering local urban characteristics in noise management planning. This study demonstrates the practical application of strategic noise mapping for health impact assessment in Japanese cities and provides valuable insights for evidence-based urban noise policy development.

  • Mizuki Iwagami, Yuting Geng, Masato Nakayama, Takanobu Nishiura
    論文ID: e25.100
    発行日: 2026年
    [早期公開] 公開日: 2026/03/06
    ジャーナル オープンアクセス 早期公開

    Parametric array loudspeakers achieve sharp directivity in audible sound by utilizing nonlinear interactions among ultrasounds in air. Conventionally, pin-spot audio has been realized by emitting ultrasounds separately. However, nonlinear interactions among sideband components lead to speech leakage outside the audio spot. A previous study applied subband decomposition to the sideband of an amplitude-modulated signal, which altered the spectrum of the leaked sound but provided limited controllability because only one sideband was processed. This study proposes a pin-spot audio design that combines double sideband modulation with suppressed carrier and subband decomposition applied across both upper and lower sidebands. By designing the structures of the two sideband spectra, the proposed method controls nonlinear interactions across them, producing more complex patterns in the resulting spectrum in air and reducing speech leakage. Moreover, a logarithmic subband decomposition approximately consistent with perceptual frequency spacing and an asymmetric sideband assignment between the two sidebands are introduced. As a result, speech leakage is reduced not merely by lowering the sound pressure level, but by altering the structure of the demodulated sound spectrum.

  • Chiho Haruta, Nobutaka Ono
    論文ID: e25.62
    発行日: 2026年
    [早期公開] 公開日: 2026/02/14
    ジャーナル オープンアクセス 早期公開

    In this paper, we propose an element selection approach for speech enhancement using deep neural networks (DNNs) targeting small devices with complexity constraints, such as hearing aids. Element selection reduces the input dimensionality by selecting specific elements from the input vector. Unlike other dimensionality reduction algorithms such as principal component analysis, element selection does not require multiplications, making it suitable for low-complexity environments. To optimize which elements are selected, we propose two methods: 1) a linear-regression-based method minimizing the regression error in estimating a target vector from the dimensionality-reduced vector and 2) a pruning-based method that selects elements corresponding to the remaining weight coefficients in the first layer after applying structured pruning to a DNN. We evaluate their performance in a speech enhancement task under complexity constraints, assuming a simple fully-connected network, with no more than 4 × 105 multiplications per inference and an algorithmic delay below 8 ms. Experiments show that the proposed approach under the complexity constraints achieves a scale-invariant source-to-distortion ratio (SI-SDR) improvement of 5.6 dB on average compared to non-processed noisy speech at signal-to-noise ratios -5, 0, and 5 dB, and 2.54 dB SI-SDR improvement compared to simply using only the latest frames.

  • Kotaro Kinoshita, Takehiro Sugimoto
    論文ID: e25.84
    発行日: 2026年
    [早期公開] 公開日: 2026/02/10
    ジャーナル オープンアクセス 早期公開

    To enhance the immersive experience of six-degree of freedom (6DoF) content, we previously proposed a distance attenuation model of the human voice - referred to as the head model - that incorporates direction-dependent head diffraction and radiation point on the basis of measurement of human utterances. In this study, we conducted a subjective evaluation experiment to verify the perceptual validity and practical applicability of the proposed head model in content production. The results suggested that the head model tends to represent the distance attenuation of the human voice under various directional and distance conditions more accurately than the conventional model. Furthermore, listeners generally did not report perceptible differences between the voice to which the head model was applied and the measured voice. These findings indicate that the head model is perceptually suitable for use in 6DoF content.

  • Toshiki Hanyu, Kazuma Hoshi
    論文ID: e25.111
    発行日: 2026年
    [早期公開] 公開日: 2026/02/03
    ジャーナル オープンアクセス 早期公開

    Sound-field intensity can be analyzed by measuring sound pressure and particle velocity. This study examines a previously proposed c–c method, in which the sound pressure and particle velocity can be measured using cardioid microphones. This method was validated for both far and near sound fields. However, to be considered as a viable alternative to the conventional p-p and p–u methods, the c–c method must also be validated in complex sound fields containing interfering sound waves. In our previous study, we took measurements by rotating a single microphone. This study aimed to further validate the c–c method using a pair of face-to-face cardioid microphones by conducting experiments in complex sound fields. The experimental results show that the sound pressure, particle velocity, complex intensity, active intensity, and reactive intensity measurements obtained in interference sound fields using the c–c method corresponded well with those obtained using the p–u probe. These results demonstrate that the c–c method can be applied to complex sound fields, including interference sound fields, and that a sound intensity probe can be constructed using a pair of cardioid microphones even if their cardioid directivities are not ideal.

  • Masao Kimura
    論文ID: e25.74
    発行日: 2026年
    [早期公開] 公開日: 2026/02/03
    ジャーナル オープンアクセス 早期公開

    The author has proposed a modified gap stiffness model incorporated into the Biot model, known as the BIMGS model. The model demonstrated that the experimental results could be reasonably explained. It describes acoustical relaxation caused by local flow in the gap between grains. In this study, the micro-geometric structure of a gap was investigated experimentally, and the results showed the grain diameter dependence of the contact radius and separation distance. Moreover, the compressional wave speeds and attenuation in water-saturated silica sands as a function of frequency were calculated and examined. The results demonstrated the existence of two attenuation peaks.

  • Naofumi Aoki
    論文ID: e25.98
    発行日: 2026年
    [早期公開] 公開日: 2026/01/31
    ジャーナル オープンアクセス 早期公開

    This paper proposes an audio information-hiding technique termed Audio Anagram, inspired by linguistic wordplay. The technique introduces a cryptographic mechanism analogous to an anagram, in which rearranged data remains perceptually plausible, thereby concealing the presence of the hidden message. The proposed method enables the extraction of secret audio data by reversing a mapping that rearranges the cover audio signal. This approach serves not only as a practically feasible method but also as a creative audio manipulation technique, allowing two distinct audio streams to coexist within a single signal.

  • Qiyuan Wang, Ken Anai, Hiroo Yano, Shinichi Sakamoto
    論文ID: e25.73
    発行日: 2026年
    [早期公開] 公開日: 2026/01/27
    ジャーナル オープンアクセス 早期公開

    The assessment of environmental noise in urban areas is challenged by the issue of effectively modelling complex noise propagation behaviors among buildings. As one of the solutions, the ASJ RTN-Model, a calculation model for road traffic noise, includes an effective calculation method for predicting noise propagation behind buildings, which enables prediction of noise distribution without heavy computational burden. However, the current model is only applicable to fixed frequency characteristic of the noise source with a fixed height at ground level, so further efforts are needed to extend its application range. In previous work, we established a modified model that is frequency-dependent and supports various frequency characteristics of the noise source. Based on the successful demonstrations, we further propose the attempt that extends the model to be applicable to elevated noise source. The effort mainly involves re-investigations of the determination of the geometric variables and constants in the prediction equations, which is then examined by comparing the prediction results to the measured results from scale model experiments.

  • Jun Takahashi, Natsuki Toda, Hironori Takemoto
    論文ID: e25.90
    発行日: 2026年
    [早期公開] 公開日: 2026/01/22
    ジャーナル オープンアクセス 早期公開

    This study examined how professional opera singers modulate vocal tract configurations to express “bright” and “dark” timbres. In the “bright” condition, F2–F3 frequencies increased with lip opening, whereas in the “dark” condition, F1 and F3 frequencies decreased with pharyngeal expansion and laryngeal lowering. These effects were more pronounced at lower pitches. An untrained participant showed minimal articulatory or acoustic variation across conditions. Simulations based on vocal-tract area functions confirmed F1–F2 changes but only weakly reflected F3, suggesting additional lateral or vertical adjustments beyond the midsagittal plane. These findings clarify articulatory–acoustic mechanisms of timbre control in operatic singing.

  • Shengyi Wu, Yuying Sang
    論文ID: e25.72
    発行日: 2026年
    [早期公開] 公開日: 2026/01/09
    ジャーナル オープンアクセス 早期公開

    The Contrastive Analysis Hypothesis (CAH) and the Speech Learning Model (SLM) make distinct predictions about how L1-L2 phonological similarity affects second language acquisition. This study evaluates these models through an analysis of Mandarin affricate production by Thai learners. Acoustic measurements showed significant differences between learners and native speakers across all parameters examined. Native speaker transcriptions revealed that while Thai learners successfully produced Mandarin's aspiration contrasts, they exhibited difficulties with place and manner distinctions: For place contrasts, learners frequently substituted alveolar (/ts/) and retroflex (/tʂ/) affricates with alveolopalatal counterparts (/tɕ/), and produced unaspirated retroflex affricates (/tʂ/) as alveolar variants ([ts]); For manner distinctions, aspirated affricates (e.g., /tsʰ/, /tʂʰ/, /tɕʰ/) were often misproduced as their homorganic fricatives (/s/, /ʂ/, /tɕ/). While these findings partially support both CAH and SLM, they suggest the need for models to incorporate more detailed phonetic specifications to fully account for L2 production patterns.

  • Kazunori Suzuki, Shinichiro Koyanagi, Takayuki Hidaka
    論文ID: e25.83
    発行日: 2026年
    [早期公開] 公開日: 2026/01/09
    ジャーナル オープンアクセス 早期公開

    This study introduced laser-induced acoustic pulses as sound sources for 1:10 scale model experiments in concert halls. These pulses exhibit high sound pressure, excellent reproducibility, short transient response, and broadband frequency range up to 100 kHz, including instrumental harmonics, making them suitable as sound sources for auralization. To measure three-dimensional sound fields in the scale model, a single microphone was sequentially moved to multiple locations on virtual spherical surfaces with radii of a few millimeters, centered at the receiving point, and room impulse responses were measured at each location. To improve both sound quality and spatial resolution during playback, measurements were performed on three concentric virtual spherical surfaces with different radii, creating fourth-order ambisonics signals covering the audible range from 50 Hz to 10 kHz. This approach ensured sound reproduction and clear sound localization compared to conventional methods, thus enabling high-quality auralization of orchestral performances using 1:10 scale models.

  • Maoto Mizutani, Kenta Iwai, Masato Nakayama, Takanobu Nishiura, Yoshih ...
    論文ID: e25.95
    発行日: 2025年
    [早期公開] 公開日: 2025/12/25
    ジャーナル オープンアクセス 早期公開

    A conventional feedforward active noise control system suffers from degraded performance of noise reduction due to the causality constraint caused by processing and propagation delays. To address this issue, we propose a multichannel feedforward active noise control system that combines an optical laser microphone and an air-conduction microphone. The proposed system relaxes the causality constraint by acquiring vibration information at the speed of light using the optical laser microphone and enhances coherence using the signal of the air-conduction microphone. Experiments using a laser Doppler vibrometer and an electret condenser microphone demonstrate superior noise reduction compared to the conventional system.

  • Tamio Sasagawa
    論文ID: e25.65
    発行日: 2025年
    [早期公開] 公開日: 2025/12/23
    ジャーナル オープンアクセス 早期公開

    The motion of a polystyrene foam sphere in a Kundt’s tube is investigated and confirmed to be driven by the primary acoustic radiation force. When spheres are placed in a straight line in the direction of the tube axis, they are aligned with the tube axis at certain spaces. This phenomenon is investigated using the balance between the primary and secondary acoustic radiation forces acting on the spheres and also using the equation of motion for the spheres. When spheres are close, they exert secondary acoustic radiation forces and adhere to each other, forming lines in the direction perpendicular to the tube axis. These lines are found to be arranged at regular spaces in the direction of the tube axis by the two types of acoustic radiation forces, forming stable striae. The spacing between adjacent striae depends on factors such as sound frequency and sphere size.

  • Kenji Kurakata, Tazu Mizunami, Kazuma Matsushita
    論文ID: e25.78
    発行日: 2025年
    [早期公開] 公開日: 2025/12/16
    ジャーナル オープンアクセス 早期公開

    An ISO standard on the reference equivalent threshold sound pressure levels for circumaural earphones was established in 2004. This international standard enabled the use of circumaural earphones for pure-tone audiometry, along with supra-aural earphones, which had long been used. However, some reports describe that the hearing threshold levels obtained using supra-aural earphones and circumaural earphones did not always agree even when the same subjects were tested under identical measurement conditions. Clinicians might become confused by this disagreement when the use of circumaural earphones becomes popular. In this study, we examined subjects by pure-tone audiometry using several earphones of both types and an identical measurement procedure. On the basis of the measurement results, the expected amount of threshold difference relative to reference circumaural earphones was calculated for supra-aural earphones of each model.

feedback
Top