Proceedings of the Technical Committee on Speech Communication

Exploring the Integration of the Open-Source Hearing Aid Project, openMHA, into Acoustic Education

Keiichi YASU, Kei KOBAYASHI, Masayuki SATO

2025Volume 5Issue 1 Article ID: SC-2025-1
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-1

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

This paper proposes the use of openMHA in hearing aid and cochlear implant education. openMHA, an open-source hearing aid simulation platform, provides access to signal processing algorithms and supports ﬂexible customization. At Tsukuba University of Technology, hands-on activities such as hearing aid ﬁtting measurements and cochlear implant simulations are conducted, but challenges remain in hands-on algorithm design and real-time processing. By integrating openMHA, students can simulate hearing aid characteristics, learn algorithm design, and actively conduct experiments and veriﬁcations. In addition, evaluating its operation on compact computers enhances the hands-on learning environment. This approach aims to equip students with skills in designing and adapting assistive technologies, thereby enhancing their basic knowledge as professionals in the ﬁeld.

View full abstract

Download PDF (571K)
The Effect of Equalization Processing on Speech Intelligibility

Evaluation of the Clear Voice Function of a TV Speaker for Older Adults

Yoshiki NAGATANI, Kazuki TAKAZAWA, Yasuyuki MATSUURA, Taiki KASAI, Nat ...

2025Volume 5Issue 1 Article ID: SC-2025-2
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-2

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

It is widely known that high-frequency hearing gradually declines with age, even in the absence of any particular illness. In addition, since it has been reported that the average time that older adults spend watching television exceeds several hours per day, being able to hear TV audio clearly may improve the quality of life of older adults. In this study, a monosyllable speech intelligibility test was conducted on participants aged 60 or older using a TV speaker equipped with a high-frequency component emphasis processing function and a commercially available TV that was ranked highly in sales. The results showed that the speech intelligibility of the tested TV speaker was statistically significantly higher than that of the TV, regardless of the degree of hearing level deterioration. In addition, in the group with a deterioration in hearing level, the improvement in speech intelligibility by using the high-frequency emphasis function was significantly greater than in the group with normal hearing. In addition, observations of confusion tendencies indicated that the high-frequency emphasis function used in this experiment may have been useful in improving the intelligibility of nasal sounds in addition to plosive and fricative sounds.

View full abstract

Download PDF (3498K)
Extruction of voice excitation informatoin without frame-based processing

Hideki KAWAHARA, Ken-Ichi SAKAKIBARA, Kohei YATABE

2025Volume 5Issue 1 Article ID: SC-2025-3
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-3

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

We propose to implement short-term Fourier transform based on the discrete Fourier transform of the whole signal without introducing conventional frame-wise procedure. We apply this implementation for investigating voice excitation source analysys.

View full abstract

Download PDF (3920K)
Stage Sound System for Identifying Speakers by Using Auxiliary Sound Sources

Yuto TANAKA, Kota TAKAHASHI

2025Volume 5Issue 1 Article ID: SC-2025-4
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-4

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

In large theaters, the sound of the performers does not reach the audience in the back seats. Therefore, ampliﬁcation is performed using microphones worn by each performer and main speakers installed in front of the stage. At this time, regardless of the performer, the sound of the speaker is positioned at a position determined by the listening position of each audience. Therefore, in situations where multiple people are on stage, it may be diﬃcult to identify the speaker. In this study, we aim to develop a system that can instantly identify the speaker without adversely aﬀecting the audience’s listening by processing the output from the auxiliary sound source device in the time-frequency plane without processing the output sound from the main speaker. As a result of the experiment, it was found that the loudness of the auxiliary sound source can be reduced by about 13 dB by limiting both the frequency direction and the time direction. In this paper, we also describe the implementation of this method using an FPGA-based real-time audio signal processing system.

View full abstract

Download PDF (2855K)
Comparison of speech emotion discrimination performance between young and older adults

Calm vs. angry/sad/happy discrimination

Yukiho HANATANI, Yuta KUROTANI, Karin YAMAZAKI, Hideki KAWAHARA, Toshi ...

2025Volume 5Issue 1 Article ID: SC-2025-5
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-5

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

To investigate the characteristics of speech emotion perception in older people with hearing loss (HL), emotion discrimination experiments were conducted using morphed speech between two emotions. In last year’s discrimination experiment, the discrimination abilities of anger, sadness, and happiness were compared between older participants and young normal hearing (YNH) participants listening to normal and simulated HL sounds. It was found that the older participants had more diﬃculty discriminating anger-happiness pairs than the YNH. Based on this ﬁnding, we formulated a working hypothesis that, in Russell’s model of the emotional cycle, the older participants would have diﬃculty judging ”pleasantness-unpleasantness” at the same level of ”arousal”. To test this hypothesis, we added new ”calmness” words and conducted a discrimination experiment for the anger-calmness, sadness-calmness, and joy-calmness pairs. The results showed that the discrimination threshold for sadness-calmness was relatively low, rejecting the working hypothesis. In all other conditions, the just-noticeable diﬀerences (JNDs) of discrimination for the older participants, whether they had HL or not, were generally larger than those of the YNH. Furthermore, there was no signiﬁcant diﬀerence in JNDs between the YNH listening to normal and simulated HL sounds. These results suggest that the dysfunction in the peripheral HL is not the only factor leading to the deterioration of emotion perception characteristics.

View full abstract

Download PDF (1842K)
Audience reception and phonology analysis of Taiwanese voice acting performance for Japanese animes

Oli Jan

2025Volume 5Issue 1 Article ID: SC-2025-6
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-6

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Past studies have shown that voiceacting performance in Taiwan has not been well-received by the audience, and is rejected by many as unnatural [1]. However, a solid linguistic analysis on the said performance has been missing. Combining the author’s training in linguistics and voice acting, this study addresses answer 2 questions: How exactly is the performance received? What are its phonetic and phonological characteristics? For the first question, this study selected 10 lines of dialogue from Japanese animes, and played the voiceacting performances of these lines in their original Japanese versions and translated Taiwanese Mandarin versions respectively to Japanese and Taiwanese audiences. Lines were chosen from daily life scenarios, spoken by characters without exaggerated voice expressions. The audiences were then asked to rank the standardness (of pronunciation) and naturalness (of performance) of the lines. While the Japanese audience ranked the lines as both standard and natural, the Taiwanese audience ranked the lines as standard but significantly less natural. For the second question, 10 informants who are native speakers of Taiwanese Mandarin were recruited to record the lines. Among the informants, 4 were professional voiceactors and 6 were non-voiceactors. Voiceactors’ and Non-voiceators’ recordings were then compared in terms of phonetic and phonological characteristics, including retroflexion, downdrifting, contraction, pitch contour, and PVI (paired variation index). The recordings from the two groups were found to show differences in these features, apart from some common downdrifting patterns. Finally, this study attempts to attribute the Taiwanese audience’ reception to the phonetic and phonological features analysed above, proposing an association between the two which can be explained by the sociolinguistic context in Taiwan.

View full abstract

Download PDF (1343K)
Inﬂuence of native language on auditory-motor control of vowels: A cross-language study of Japanese, Chinese, Vietnamese

Shogo MURAKAMI, Yasufumi UEZU, Masashi UNOKI

2025Volume 5Issue 1 Article ID: SC-2025-7
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-7

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

This study investigated whether auditory-speech motor control of vowel F1 depends on the speakers’ native language by conducting a formant-altered auditory feedback experiment with native speakers of Japanese, Chinese, and Vietnamese. The target vowels for production were the native vowel /e/ and the English vowel /æ/, and compensatory responses to F1 perturbation were analyzed. As results, the following four points were obtained. (1) Among Japanese native speakers, no diﬀerences were observed in the mean of F1 and F2 between the native and English vowels, and F1 compensation under F1-decrease condition was greter than that under F1-increase condition. (2) Among Chinese native speakers, diﬀerences in the mean of F1 and F2 between the native and English vowels were observed, and F1 compensation under F1-increase was greater than that under F1-decrease condition. (3) Among Vietnamese native speakers, no diﬀerences were observed in the mean of F1 and F2 between the native and English vowels, and there were no diﬀerences F1 compensation under F1-increase and F1-decrease conditions. (4) A cross-language comparison showed diﬀerences in F1 compensation under the F1-increase condition for the English vowel. These results indicated that the learning of auditory-speech motor control of F1 for the vowel /e/ depends on the speakers’ native language.

View full abstract

Download PDF (1254K)
Subjective and objective speech intelligibility evaluation of hearing aid prescriptions using a hearing loss simulator

Shintaro DOAN, Kazuyuki TOKIMASA, Ayako YAMAMOTO, Toshio IRINO

2025Volume 5Issue 1 Article ID: SC-2025-8
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-8

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

The number of older people with age-related hearing loss is increasing, and the development of speech and hearing technologies beyond current hearing aids is desired. To this end, an environment in which developers of signal processing algorithms can directly listen to processed sounds and objective evaluation indices that enable accurate predictions would eﬀectively promote development. In this study, to explore the feasibility of such situations, the eﬀectiveness of the POGO, NAL-RP, and DSLm[i/o] prescriptions was evaluated through speech intelligibility (SI) experiments with young normal-hearing listeners using a hearing loss simulator, WHIS. The results showed that SI was improved by DSLm[i/o], NAL-RP and POGO, in that order, but was lower than without the simulated hearing loss. We attempted to predict these results using the published objective intelligibility metric, GESI, but found that the prediction overestimated the subjective SI and that there was room for improvement.

View full abstract

Download PDF (1643K)
Evaluation of Interaural Level and Time Difference in Adults with Listening Difficulties and Auditory Processing Disorder

Erina S. ONOE, Kazuaki HONDA, Chihiro ITOI, Shimpei YAMAGISHI, Haruna ...

2025Volume 5Issue 1 Article ID: SC-2025-9
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-9

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Listening Difficulties and Auditory Processing Disorder (LiD/APD) is a condition in which an individual with LiD/APD has difficulty in hearing speech despite standard pure tone audiometric tests showing normal ranges. Although it is known that people with LiD/APD have a wide variety of symptoms, the mechanisms remain unclear. In this study, we focused on acoustic feature processing at the brainstem in adults with LiD/APD and evaluated the sensitivity to interaural level difference (ILD) and interaural time difference (ITD). The results showed that some LiD/APD subjects showed higher thresholds for both ILD and ITD than that of the control group, while others showed thresholds similar to the control group. This suggests that there is no single mechanism for the symptoms of hearing difficulties.

View full abstract

Download PDF (2262K)
Application of infrared laser neural stimulation to auditory prosthesis

Head-fixed classical conditioning revealed laser-evoked perception

Yuta Tamai, Miku Uenaka, Jumpei Matsumoto, Julia Löschner, Koji Toda, ...

2025Volume 5Issue 1 Article ID: SC-2025-10
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-10

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Over the last decade, there has been discussions about applying infrared laser stimulation to brain–machine interfaces, such as cochlear implants, due to its capability of activating spatially selected neural populations. We established a transtympanic laser stimulation by leveraging its contactless feature. Laser stimulation evokes auditory perception in an intensity dependent manner. Furthermore, the simultaneous combination of auditory and laser stimulation induced greater auditory perception than either auditory or laser stimulation alone. These results suggest that laser stimulation can elicit and potentially enhance auditory perception, holding promise for implementation in auditory prostheses.

View full abstract

Download PDF (2268K)
[招待講演] 聴覚障害と人工内耳をめぐる最近の現況

経司田渕

2025Volume 5Issue 1 Article ID: SC-2025-11
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-11

RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (453K)
Interference phenomenon between rhythmic tapping and speech

Ryosuke O. TACHIBANA, Oli JAN, Toshiki ARAKI, Hiroko TERASAWA

2025Volume 5Issue 1 Article ID: SC-2025-12
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-12

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

This study explores the relationship between speech production and rhythmic motor control by focusing on the interference phenomenon between rhythm tapping and speech. Experiments were designed to measure disruptions in rhythm when participants performed speech tasks while tapping in either regular or swing rhythms. Factors contributing to the interference effect, including articulatory precision, tapping accuracy, and musical experience, were also analyzed. The results suggest that speech and tapping share common rhythm control mechanisms. This research contributes to understanding the connection between rhythmic motor control in vocal learning animals and the neural basis of human speech and motor control.

View full abstract

Download PDF (1507K)
Development of a new musical instrument with improved universal design and higher freedom of expression

Mai KUROKI, Masataka NISHIMURA

2025Volume 5Issue 1 Article ID: SC-2025-13
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-13

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Most of conventional musical instruments are not designed for everyone including disabled people. Here, we propose a “new instrument” having a higher level of universal design to provide chances to enjoy music for everyone including infant, aged or disabled people, by establishing necessary fundamental technologies to realize the new instrument. For the implementation, we use tablet computers and change intensity and/or pitch of the played sound based on the tracking data of user’s finger. The technical challenge is the difference of tracking resolution depending on the touch panel (or tracking device). In the present study, we show tracking intervals measured from several touch panels and discuss how we should interpolate the tracking data to suppress the difference in played sounds from different tablet computers.

View full abstract

Download PDF (1498K)
Designing spoken language interaction

Soichiro MATSUDA

2025Volume 5Issue 1 Article ID: SC-2025-14
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-14

RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

Show abstractHide abstract

The rise of generative AI tools, including Large Language Models (LLMs), is beginning to have a profound impact on psychologists exploring human behavior. These tools hold promise for both the analysis of linguistic data and the establishment of novel experimental paradigms. However, studies examining control variables in human-human interactions using spoken language remain limited, highlighting the constraints of analyses focused solely on spoken language. This presentation introduces behavioral design research in human-human interaction scenarios and aims to explore the potential for interdisciplinary research on human-human interactions.

View full abstract

Download PDF (519K)
Visualized feedback of acoustic features of own speech sound during utterance training: Evaluation by young people with hearing impairment.

Keiichi YASU, Yoshiji YASU

2025Volume 5Issue 1 Article ID: SC-2025-15
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-15

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

This study proposes an online environment, "speechcomparison," to support speech and articulation training for students with hearing impairments. The system visualizes acoustic features of self-produced speech and provides visual feedback to improve speech clarity. Built on a Python server (Flask) and a web client, the system oﬀers functionalities for displaying audio waveforms and spectrograms, recording, saving, and comparing audio data. It also allows ﬂexible management of sample and tester databases, enabling intuitive operations for recording and comparing speech data. Future work includes long-term usability evaluations by individuals with hearing impairments and the integration of acoustic features related to speech clarity to enhance the eﬀectiveness of speech training further.

View full abstract

Download PDF (711K)
Initiatives to Facilitate Communication Among Deaf/Hard of Hearing People and Hearing People －Addressing Issues in Voice Conversations

Addressing Issues in Voice Conversations from the Perspective of Deaf and Hard of Hearing Individuals

Chiemi WATANABE

2025Volume 5Issue 1 Article ID: SC-2025-16
Published: January 25, 2025
Released on J-STAGE: March 16, 2025

DOIhttps://doi.org/10.60274/asjsc.SC-2025-16

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

This paper discusses the challenges in communication between deaf/hard of hearing and hearing individuals and initiatives to address these challenges. Voice conversations have real-time and volatile characteristics, where information disappears instantly. While many people experience these issues, they pose particularly serious challenges for deaf and hard of hearing individuals. Through dialogue experiments using speech recognition tools, we identified issues such as hearing participants' insufficient verification of transcription tools and how these tools can inadvertently mask signals that deaf/hard of hearing individuals cannot hear. Furthermore, we identified fundamental challenges including the rapid pace of speaker transitions, low tolerance for silence, speakers' overestimation of their communicative effectiveness (illusion of transparency), and considerations for face-threatening acts. Based on these findings, we describe our ongoing research and propose that beyond mere technical support, mutual understanding among conversation participants and the establishment of conversational environments (including rule-setting) are necessary.

View full abstract

Download PDF (1727K)

Register with J-STAGE for free!