Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Advance online publication
Displaying 1-22 of 22 articles from this issue
  • Shogo Fukawa, Takashi Nose, Shuhei Imai, Akinori Ito
    Article ID: e24.47
    Published: 2024
    Advance online publication: July 26, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    This paper proposes a voice conversion named SpSiVC that appropriately converts both speech and singing voices with a single model. Since the distribution of pitch between speakers is significantly different for speech and singing voices, voice conversion has been mainly evaluated as a separate task for speech and singing voice conversion. SpSiVC introduces an adaptive F0 loss, which enables conversion that implicitly switches the shift width of the logarithm F0 according to the type of input voice. We examine the effectiveness of the F0 constraints in objective and subjective evaluations.

    Download PDF (275K)
  • Kenko Ota
    Article ID: e24.53
    Published: 2024
    Advance online publication: July 26, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Reducing the burden of data collection is crucial for advancing speech recognition research. Hence, this research focuses on exploring methods to enhance machine learning from limited data by augmenting the training data based on three-dimensional measurements in the field of Japanese silent speech recognition. We compared the connectionist temporal classification losses during training and the recognition performance with and without key data augmentation techniques to evaluate the effectiveness of the proposed method utilizing the direct linear transformation method. In this case, the deep neural network was trained successfully, resulting in a reduced phoneme error rate.

    Download PDF (610K)
  • Kazuya Yokota, Masataka Ogura, Masajiro Abe
    Article ID: e24.55
    Published: 2024
    Advance online publication: July 20, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Recently, physics-informed neural networks (PINNs) have garnered attention for use as a numerical simulation method for inverse analysis, such as property identification. However, studies on PINNs for conducting acoustic analysis are scarce. Thus, this study developed PINNs that performed acoustic analysis of the vocal tract and synthesized voiced sounds. In addition, PINNs were used to identify glottal source waveforms. Consequently, PINNs were demonstrated to be a promising solution for the inverse problem related to speech production.

    Download PDF (555K)
  • Shinsuke Nakanishi
    Article ID: e24.33
    Published: 2024
    Advance online publication: July 18, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    An acoustic metasurface (AMS) can provide high broadband sound absorption realized by planar periodic array assembly of small Helmholtz resonators tuned at different resonant frequencies. The formulation for the sound absorption coefficient of the AMS developed in a previous study provides a uniform and nearly perfect sound absorption in the one octave band. Numerical case studies suggest that the area ratio of the unit cell assembly for the broadband perfect sound absorption of the AMS can be formulated as a power function of frequency ratio to the center frequency and that the formulation will be identical for any center frequency.

    Download PDF (1982K)
  • ―Three audio representations: channel-based, object-based, and scene-based―
    Takehiro Sugimoto
    Article ID: e24.65
    Published: 2024
    Advance online publication: July 18, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Currently, there are three mainstream audio representations, namely channel-based audio, object-based audio, and scene-based audio. The features of content expression differ among these audio representations, the details of which have been specified in the International Telecommunication Union: Radiocommunication Sector (ITU-R) Recommendations. The effective use of these audio representations in accordance with what is to be expressed in the content requires a deep understanding of the technical specifications and capabilities of the audio representations. This review first traces the evolution of loudspeaker layouts developed in recent years, i.e., a history of multichannelization, which is indispensable for the understanding of audio representations. Then, the position of each audio representation among various audio-related standards is described and the method of adopting and implementing each audio representation in other audio-related standards is reviewed using the Moving Picture Experts Group (MPEG) standards as examples.

    Download PDF (6484K)
  • Mio Yonezawa, Naohisa Inoue
    Article ID: e24.52
    Published: 2024
    Advance online publication: July 12, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Two finite element models have recently been utilized to predict the sound absorption coefficient of periodically arranged slit resonators. One is based on the linearized Navier–Stokes equation, entropy conservation law, and mass conservation law. The other is based on the Helmholtz equation with the visco-thermal boundary layer boundary condition. This paper clarifies a condition under which the latter model gives a reasonable approximation. A transition frequency is derived from Johnson–Allard’s effective density of porous media as a criterion. Calculation results demonstrate that the resonator’s resonance frequency must be higher than the transition frequency to obtain a reasonable prediction using the boundary condition model.

    Download PDF (1312K)
  • Fumiyoshi Matano, Yuya Tagusari, Takanori Horibe, Junya Koguchi, Masan ...
    Article ID: e24.34
    Published: 2024
    Advance online publication: July 05, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    State-of-the-art text-to-speech systems have improved in sound quality and have become increasingly large in terms of the number of subjects to detect differences in MOS evaluation, which uses the five-scale precision. The MUSHRA method can precisely detect differences in sound quality compared with the MOS method because sound qualities are rated on a relative scale of 0 to 100 on 101 scales. However, it has the drawback of requiring hidden reference and anchors; thus, it cannot detect cases exceeding the hidden reference. Our method, named Taut-MUSHRA, requires no hidden reference and anchors and instead adds two constraints to the subjects. As a result, compared with the MOS method, our Taut-MUSHRA method could more sensitively detect differences in sound quality.

    Download PDF (188K)
  • Takayuki Arai
    Article ID: e24.40
    Published: 2024
    Advance online publication: July 05, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    We have developed a prosthetic device for speech sound disorders based on our earlier vocal-tract model. The proposed device mainly consists of a mouth piece, lip plates, and imitation tongue. We first estimated the vocal-tract area functions, particularly when the tongue is at the resting position and when it is raised up. We then tested the output sounds produced by a human speaker using the device with different configurations of the imitation tongue and open/close gestures of the lip plate. The results showed that, while the prosthetic device produced sounds of only moderate quality, the phrases became more intelligible.

    Download PDF (890K)
  • Tatsuhiro Tanaka, Makoto Otani
    Article ID: e24.16
    Published: 2024
    Advance online publication: June 18, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Perfectly diffuse sound fields can be modeled by using an infinite number of random coefficients of spherical wave functions or random plane waves. The statistical properties of the spherical harmonic coefficients and the plane-wave amplitudes in these models directionally define the perfect diffuseness, yielding the established theoretical formulations of directionally characterized perfect diffuseness. However, sound fields in real rooms are not perfectly diffuse. These models and formulations of perfect diffuseness thus provide limited insight into understanding pseudo-perfect diffuseness, the diffuseness of real sound fields. This thereby requires a theory efficaciously describing pseudo-perfectly diffuse sound fields. Here, we rigorously compare existing formulations of directionally characterized pseudo-perfect diffuseness, aiming to determine its better formulation for the diffuseness evaluations of real sound fields. Our theoretical and numerical results indicate the advantages of the formulation using random coefficients of truncated- or finite-degree spherical wave functions over the formulation using a finite number of random plane waves.

    Download PDF (3186K)
  • Hideki Kawahara, Masanori Morise
    Article ID: e24.43
    Published: 2024
    Advance online publication: June 13, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    We generalized a voice morphing algorithm capable of handling temporally variable, multiple-attributes, and multiple instances. The generalized morphing provides a new strategy for investigating speech diversity. However, excessive complexity and the difficulty of preparation have prevented researchers and students from enjoying its benefits. To address this issue, we introduced a set of interactive tools to make preparation and tests less cumbersome. These tools are integrated into our previously reported interactive tools as extensions. The introduction of the extended tools in lessons in graduate education was successful. Finally, we outline further extensions to explore excessively complex morphing parameter settings.

    Download PDF (2048K)
  • Koichi Mori
    Article ID: e24.37
    Published: 2024
    Advance online publication: June 08, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    The aim of this review is to introduce the concept of neurodiversity as used for developmental stuttering. Since the introduction of the ICF by WHO in 2001, the social model has been introduced into clinical practice. However, it primarily asks the community to be responsible for the accommodation of persons with disabilities (PDs). In addition to the necessity of changes in the legal and legislative environments to conform to the Convention on the Rights of Persons with Disabilities of the United Nations (2006), effective education and advocacy are needed for society to acknowledge and reduce biases of ableism and stigma of disabilities. Ableism is the claim that society is for able-bodied and able-minded people. Ableism remarks and behaviors may impact PDs adversely and are called microaggressions. The diversity movement tries to embrace PDs by removing the border between the able and the disabled. The etiology and characteristics of developmental stuttering are depicted, as well as its neurodiverse and complex nature. The recent advances in the treatment of stuttering without ableism are introduced. Education and advocacy of (neuro)diversity and inclusion in society are still sorely needed for medical and welfare professionals as well as for the general public.

    Download PDF (422K)
  • Junta Tagusari
    Article ID: e24.50
    Published: 2024
    Advance online publication: June 01, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Areas in the vicinity of trunk roads are exposed to high levels of noise and may pose high health risks to residents. To assess the health risks and formulate effective noise mitigation measures, prediction of road traffic noise is crucial. Addressing this issue, the author has developed a road traffic noise prediction system that allows prediction of sound levels using a database of road network. The objective of the present study was to predict road traffic noise in the vicinity of the trunk roads using a Digital Road Map Platform (DRM-PF) database, which contains nationwide road geometries and traffic settings of trunk roads. Predicted sound levels were compared with actual measurements and noise maps were created to demonstrate the feasibility of assessing noise exposure and associated health risks in the vicinity of trunk roads in Japan. The results show a generally good agreement between predicted and measured levels, while challenges remain in accurate prediction in a number of environments, mainly due to the lack of accurate geometries. The extensive coverage of the DRM-PF database throughout Japan enables noise mapping in arbitrary regions near trunk roads, which would contribute to making noise policy.

    Download PDF (772K)
  • Toru Miyairi
    Article ID: e24.31
    Published: 2024
    Advance online publication: May 29, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Consumers value products that provide a multi-sensory experience. This paper provides a summary of our recent work on the design of sensory experiences using the sound symbolism of onomatopoeia with a focus on the operation of rotary switches. The experiments were conducted to collect onomatopoeic expressions that reflect tactile and auditory experiences during switch operation. A quantitative text analysis was conducted to examine the correspondence between these expressions and physical quantities. In the representation of mono-sensory experiences through onomatopoeic expressions, our findings reveal a distinct sound symbolism in onomatopoeic expressions that reflects changes in physical quantities such as click torque for tactile sensations and sound loudness and sharpness for auditory sensations. For tactile and auditory multi-sensory experiences, the onomatopoeic expressions incorporate features of both sensations. Moreover, the results suggest that when tactile and auditory stimuli are combined, the sharpness of the operating sound has the most influence on onomatopoeic expressions. These insights suggest the potential of using the sound symbolism of onomatopoeia for quantitatively designing sensory experiences. This approach could be used to capture consumer intent and incorporate qualitative experiences into product design.

    Download PDF (906K)
  • Kenji Kita
    Article ID: e24.30
    Published: 2024
    Advance online publication: May 23, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Binaural and transaural systems are well-known methods of sound field reproduction. However, each system has its own problems such as in-head localization and narrow control points. Therefore, this study aims to design a sound field reproduction system using shoulder-mounted wearable-speakers that can solve these problems. The system is expected to sustain the effect of sound field reproduction even while moving and is expected to be applied in the field of entertainment. This paper shows the head transfer function of the wearable loudspeaker and designs an inverse filter, which is indispensable for the sound field reproduction system, using H control theory.

    Download PDF (434K)
  • Takumasa Tsuruha, Makoto Otani, Yasushi Takano
    Article ID: e23.81
    Published: 2024
    Advance online publication: May 15, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Elastic modulus of granular materials consisting of non-adhered solid grains depends on its compressive stress. As vertical compressive stress increases with depth due to gravity without any additional force, the elastic modulus of granular materials also increases along the gravitational direction. This study investigates the sound absorbing characteristics of granular materials with such gradient elasticity for both vertical and horizontal sound propagations, considering applications to vertical walls, horizontal ceilings, and floors. The normal incidence sound absorption coefficients of hollow glass beads, chosen as a representative granular material, were measured using an impedance tube. This measurement was conducted under conditions where the sound wave propagated perpendicular and parallel to the gradient direction of elasticity. The frequency and magnitude of the first peak in the sound absorption coefficient varied with the direction of sound propagation. A method for predicting this variation was proposed, showing that the frequency at which the first peak occurred could be predicted within an error of 10%. Additionally, the effects of the thickness and shape of the absorbing surface on the sound absorption coefficients are elucidated through calculations.

    Download PDF (1232K)
  • Josef Schlittenlacher, Megan Brogan
    Article ID: e24.39
    Published: 2024
    Advance online publication: May 14, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    The irrelevant speech effect (ISE) depends on psychoacoustic features such as the perceived fluctuation of the sound. Thus, it seems likely that a hearing loss may also affect the amount of the ISE. The present experiment studied the ISE in 30 listeners with normal hearing and 30 listeners with self-reported hearing loss in four different background sounds: speech, music with vocals, instrumental music without lyrics, steady noise. As expected, short-term memory performance increased in that order for both groups. However, there was no statistically significant interaction between the two groups, both memorized 0.9 digits more in noise than in speech.

    Download PDF (354K)
  • Junya Koguchi, Yuya Tagusari, Masanori Morise
    Article ID: e24.02
    Published: 2024
    Advance online publication: May 11, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    In this study, the assessment of musical performances is investigated, focusing particularly on the difference between performers’ self- and external assessments. Understanding the difference is significant for both the improvement of teaching methods in musical education and the early detection of musician’s dystonia. The research centers on brass instrument players performing long tones, a simple task that reduces skill-based performance variances. Additionally, it incorporates Go/No-go tasks to minimize assessment biases. The results reveal a notable discrepancy between performers’ and third parties’ assessments. We assume that the reasons are the difference in information and the decreased perceptual accuracy of the performers.

    Download PDF (414K)
  • Yoshiko Arimoto, Yasuo Horiuchi, Sumio Ohno
    Article ID: e24.05
    Published: 2024
    Advance online publication: May 11, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    A reliable method of determining the base frequency (Fb) for utterances of various speaking styles is critical to enabling stable command labeling in the Fujisaki model. To achieve stable command labeling for diverse expressions of speech, a linear fitted model was developed using the ten percentile F0 of each utterance from three corpora of various speaking styles (read, acted, and spontaneous) as the independent variable to estimate a consistent Fb for each utterance. To assess the robustness of the model for unknown utterances, the model was applied to test data, including both open and corpus-open data not used for the model development, and the difference between the estimated Fb and the trained labelers’ annotated Fb was calculated. As a result, the obtained estimation model was found to fit well to the manually labeled Fbs by exhibiting a small root mean squared error (RMSE) of 0.096 and a high coefficient of determination (R2) of 0.89 for the closed dataset. Moreover, the model also exhibited a small RMSE of 0.091 and a high R2 of 0.92 for the corpus-open dataset. The results revealed that the proposed model can reliably estimate the Fb of utterances with various speaking styles.

    Download PDF (426K)
  • Ryohei Suzuki, Kanae Amino, Takayuki Arai
    Article ID: e24.07
    Published: 2024
    Advance online publication: April 26, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Human speaker recognition performance can be degraded by various factors. Understanding the factors affecting it and the errors caused by these factors is crucial for forensic applications. To study the effects of noisy environments on human speaker recognition, we conducted a hearing experiment using speech samples of two words by five male speakers, and two noise types (speech-like noise and environmental noise in boiler room) with three steps of signal-to-noise ratio (∞, 0 dB, or −10 dB). The results suggested that the listeners tended to observe different speakers to be the same speaker rather than vice versa, and this tendeny was also affected by sex of the listener.

    Download PDF (963K)
  • Irwansyah, Sho Otsuka, Seiji Nakagawa
    Article ID: e24.10
    Published: 2024
    Advance online publication: April 19, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    This study explores the impact of pinna hardness and vibrator placement on the efficacy of bone conduction through the pinna. Hearing thresholds of twelve participants, all without abnormal pinna conditions, were assessed across frequencies ranging from 250 Hz to 8 kHz, with vibrators positioned at three distinct locations—the front of the ear canal, the earlobe, and behind the cymba concha. Additionally, with a focus on consistent variable manipulation in a controlled experimental scenario, four silicone ear models with Shore hardness values from 0A to 45A were utilized to examine vibrational energy transmission via an accelerometer fixed behind the ear canal. The results indicated that vibrator placement significantly influenced hearing thresholds, a pattern that was also observed in the silicone models. However, the anticipated correlation between pinna hardness and hearing thresholds was not significant within the human sample. This could be attributed to less variability in natural pinna hardness than expected. While it is recognized that pinna hardness varies among individuals, our study reveals a less dramatic variation in pinna hardness among individuals, suggesting that its influence on bone conduction may be less critical than other anatomical factors.

    Download PDF (3893K)
  • Yuki Ishizaka, Sho Otsuka, Seiji Nakagawa
    Article ID: e24.28
    Published: 2024
    Advance online publication: April 19, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    The medial olivocochlear reflex (MOCR) is reported to be modulated by the predictability of an upcoming sound occurrence. Here the relationship between MOCR and internal confidence in temporal anticipation evaluated by reaction time (RT) was examined. The timing predictability of the MOCR elicitor was manipulated by adding jitters to preceding sounds. MOCR strength/RT unchanged in a small (10%) jitter condition, and decrease/increase significantly in the largest (40%) jitter condition compared to the without-jitter condition. The similarity indicates that the MOCR strength reflects confidence in anticipation, and that the predictive control of MOCR and response execution share a common neural mechanism.

    Download PDF (449K)
  • Leo Misono, Kenji Muto
    Article ID: e24.19
    Published: 2024
    Advance online publication: April 13, 2024
    JOURNAL OPEN ACCESS ADVANCE PUBLICATION

    Cicadas sound loudly and interfere with traffic noise measurements. The frequency characteristics of some outdoors cicada sounds have been reported, but the background noise and distance to the cicada have not been considered. The aim of this work was to accurately measure the frequency characteristics of the A-weighted sound pressure level of each robust cicada sound. The frequency characteristics of the /mi/ and /n/ sounds were measured in a free field. The dominant frequencies were 4.7 kHz for the /mi/ sound and 15 kHz for the /n/ sound, and the distributions of the peak frequencies for their sounds were normal.

    Download PDF (20064K)
feedback
Top