Proceedings of the Technical Committee on Speech Communication

Spectrotemporal modulation provides a cue for detecting repetition in noise

Ryosuke O. TACHIBANA, Sotaro KONDOH, Jun NITTA, Kazuo OKANOYA

2024Volume 4Issue 1 Article ID: SC-2024-1
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-1

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Sound repetition perception is key to understanding our environment. We can detect repeated short bursts of white noise, but longer noise repetitions are harder to notice and cause individual differences. This study explored such individual differences, focusing on spectrotemporal modulation as a cue for repetition perception. We hypothesized that sensitivity to spectrotemporal modulation explains this difference. We measured repetition perception as the d-prime (d’) score and modulation sensitivity as the discrimination threshold. Results revealed a significant correlation between noise repetition detection and modulation sensitivity among participants. Further analysis indicated that musical skill influences modulation sensitivity but not repetition detection directly. This suggests that spectrotemporal modulation is a key cue in the perception of noise repetition.

View full abstract

Download PDF (2856K)
Synchronization of Gamma Waves in the Brains of Elderly People when Listening to 40 Hz-Modulated Sound Stimuli

Yoshiki NAGATANI, Kazuki TAKAZAWA, Rara SHIMAUCHI, [in Japanese], Masa ...

2024Volume 4Issue 1 Article ID: SC-2024-2
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-2

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Some reports show that sensory stimulation, such as flickering lights and pulse sounds with a 40 Hz cycle, can be useful for treating dementia or suppressing its deterioration. We have previously shown that brain waves are synchronized by listening to amplitude-modulated sounds in addition to pulse sounds. However, the studies were conducted only in young people group. In this study, we employed both elderly and young people to listen to 40-Hz amplitude-modulated sound stimuli to compare the degree of synchronization of gamma waves. Stimuli whose entire sound source of a news or music program was modulated and stimuli only whose part excluding the speech sound was modulated were included. The presentation level was controlled to compensate the loss in hearing level of each participant. The results showed that 40 Hz brain waves were statistically significantly synchronized to all modulated sounds in both age groups showing no clear difference in the degree of brain wave synchronization between the two age groups. This result shows that the gamma-band brain wave is synchronized by presenting

View full abstract

Download PDF (1438K)
Cultural Comparison of Auditory Temporal Processing Based on Statistical Learning of the Brain

Tatsuya DAIKOKU

2024Volume 4Issue 1 Article ID: SC-2024-3
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-3

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

This study investigates how children's songs (nursery rhymes) are acquired through statistical learning. Using a Hierarchical Bayesian Statistical Learning (HBSL) model that emulates the statistical learning system of the brain, we examined perception and generation processes via simulation experiments across English, Germany, Spain, Japan, and Korean songs. We explored how the model's characteristics evolved over 15 learning trials for each song. Furthermore, by utilizing the probability distributions of each model after 15 learning trials, we generated new songs through automated composition. The results suggest that regardless of culture, statistical learning gradually establishes a hierarchical structure of statistical knowledge over 15 learning trials. In terms of generation, statistical learning leads to a gradual increase in delta-band rhythm. These findings indicate that cultural differences may not significantly modulate the effects of statistical learning. Additionally, this study highlights the developmental origins of creativity and the importance of statistical learning during early development.

View full abstract

Download PDF (910K)
Physiological response of mothers to infant crying in stressed situations and effects of odor stimuli

Mingdi XU, Yui TAMADA, Wangshuyao SUN, Yukei HIRASAWA, Mika SHIRASU, M ...

2024Volume 4Issue 1 Article ID: SC-2024-4
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-4

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

The present study aimed to comprehensively investigate the influence of infant-specific odor on mothers' responses to acute stress, encompassing subjective feelings, physiological signals, and biochemical indicators, during a cognitive task accompanied with stressful sounds. Twenty-seven Japanese mothers with infants aged 2-11 months participated in a N-back task while listening to intense infant crying sounds or pink noise sounds, with or without exposure to infant-specific odor. Heart rate variability (HRV) indicators revealed that the unique infant odor appears to predominantly enhance sympathetic nervous system (SNS) activity (increased LF/HF ratio and decreased HFnu) when participants are exposed to infant crying sounds during the stressful N-back task, compared to hearing pink noise. This suggests that the infant odor stimulus, known for elevating oxytocin levels in mothers, may sensitize mothers to infant crying signals even in demanding situations. Despite this heightened autonomic state, the subjective experience of reduced fatigue due to the exposure to infant odor implies a potential enhancement of parenting energy. To our knowledge, our study is the first to identify alterations in physiological responses triggered by exposure to infantspecific odor. Our findings offer valuable insights for future clinical application of infant-specific odor, especially for individuals suffering from postpartum depression, or mothers of children with developmental disorders.

View full abstract

Download PDF (1275K)
Voice source information representation based on the instantaneous frequency and events

Hideki KAWAHARA, Ken-Ichi SAKAKIBARA, Kohei YATABE

2024Volume 4Issue 1 Article ID: SC-2024-5
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-5

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Repetitive valving of airﬂow is the excitation source of voiced sounds. This repetition rate closely correlates with the perceptual attributes called “pitch.” Since this repetition is approximately regular in voiced sounds, a harmonically related sum of sinusoids models voiced sounds. However, excitation intervals deviate from periodicity in the onset and oﬀset of voiced regions. Moreover, the actual valving of the vocal fold is aﬀected by the eﬀects of muscle, mucosal, airﬂow, and reﬂecting sound waves; consequently, strict harmonic relations do not hold. In this report, we introduce a structured test signal for frequency-modulating the fundamental frequency of the synthetic voiced sounds to investigate relations between the instantaneous frequency of the fundamental component and the repetition rate of valving events.

View full abstract

Download PDF (1467K)
3D spatial radiation characteristics of vocalizations

Katuhiro MAKI, Eriko AIBA, Shunsuke KIDANI, Shigeaki AMANO

2024Volume 4Issue 1 Article ID: SC-2024-6
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-6

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

In this study, we performed simultaneous multi-point measurement of vocalizations using a 42-channel spherical microphone array, and investigated the spatial radiation characteristics of vocalizations. As a result, it was found that frequency components of vocalizations around 700 kHz were radiated diagonally downward in the forward direction of the speaker, frequency components around 3 kHz were radiated diagonally upward in the forward direction, and frequency components over 5 kHz were radiated generally in the front direction. In addition, it was revealed that individual differences in spatial radiation characteristics were appeared from 3 kHz to 8 kHz. Since there are no large individual differences above 3kHz in the frequency spectrum of speech measured in front of the speaker, individual differences in radiation directivity above 3 kHz may be useful as acoustic clues in elucidating the characteristics of voice quality.

View full abstract

Download PDF (2729K)
Remote control of vocal tract models by voice commands via online conference system

Ryohei SUZKI, Rinne KOBAYASHI, Atsushi FUJIOKA, Takayuki ARAI

2024Volume 4Issue 1 Article ID: SC-2024-7
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-7

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

We demonstrated a vocal tract model to remote areas as a measure to cope with an epidemic of COVID-19 and as a measure to spread acoustic education. In this study, we investigated the implementation, evaluation and improvement of a system to recognize voice commands sent from a remote location via an online conference system and to operate a vocal tract model with a robot arm based on the recognized commands, which was proposed by Arai (ICA, 2022).

View full abstract

Download PDF (3400K)
[Invited] Auditory perceptual world in people with visual impairments

Takahito MIURA

2024Volume 4Issue 1 Article ID: SC-2024-8
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-8

RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

Show abstractHide abstract

Support systems have been developed to facilitate the active participation of the visually impaired, as the impairment limit information acquisition and activities. However, eﬀorts to improve the quality and quantity of information presented to persons with disabilities are still in the developmental stage due to the unique perception and recognition styles in persons with visual impairment. By clarifying such perception and recognition mechanisms and the current status of assistive technologies, researchers can develop support schemes taking advantage of the abilities of people with visual impairment, and extend their information processing. Therefore, the purpose of this presentation is to organize the unique characteristics of perceptual ability and communication strategies reportedly acquired by persons with visual and hearing impairments and the information assurance methods based on sensory substitution. Particularly, this talk includes a visual substitution system that focuses on sound and tactile information, in addition to their sensory substitution functions, and example applications through support methods in parasports and computer games. The content of this talk is based on the articles written by the author [1], [2] .

View full abstract

Download PDF (184K)
Ensuring Information Accessibility for Deaf and Hard-of-Hearing Passengers During Emergency Announcements in Railway

Naoya HITAKA, Chiemi WATANABE

2024Volume 4Issue 1 Article ID: SC-2024-9
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-9

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

There is a problem that the deaf and hard of hearing (DHH) people cannot understand voice announcements when using trains and cannot grasp the situation. In recent years, displays have been installed in trains to display announcements, but in emergencies such as emergency stops, the driver’s voice announcement is almost always the only one that can be heard. One possible solution for the DHH people is to use the voice recognition tools installed in their smartphones to transcribe announcements. However, the voice recognition rate is low in trains due to the high noise level, and the content cannot be fully understood. In this study, we propose a method for presenting incomplete transcription results recognized by a voice recognition system by correcting them to make them easier to understand, so that DHH people can understand emergency announcements in trains.

View full abstract

Download PDF (1336K)
Evaluation of the wearing position of a cartilage conduction type transducer by adults with conductive hearing loss: Changes in pure tone threshold and speech intelligibility depending on head ﬁxation position

Keiichi YASU, Shota FUJIE, Tomoki NAKATSU, Rumi HIRAGA

2024Volume 4Issue 1 Article ID: SC-2024-10
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-10

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

For one participant with conductive hearing loss, pure tone threshold was measured using a cartilage conduction transducer, and subjective evaluation of phoneme balance sentences was performed. The results of the experiment showed that the hearing level near the mastoid process was the best. Deterioration of pure tone hearing was observed in the following order: around the temple, around the external auditory canal, around the tragus, and the angle of the mandible. In evaluating phoneme balanced sentences, participants reported that it was easiest to hear when the vibrator was ﬁxed near the mastoid process.

View full abstract

Download PDF (1083K)
A survey on music listening by cochlear implant users

Akane MARUYAMA, Keiji TABUCHI, Rumi HIRAGA, Iku KOYANO, Hiroko TERASAW ...

2024Volume 4Issue 1 Article ID: SC-2024-11
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-11

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

A questionnaire survey was conducted to investigate how cochlear implant users listen to and enjoy music, and to examine the factors associated with them. The results showed that 102 people responded to the questionnaire, indicating that many people continue to enjoy music after implantation. The results of a time series study showed that the enjoyment of music decreased with hearing loss, but improved to the same level as before hearing loss after implantation. Furthermore, while there was a trend toward lower ratings of sound quality and music with increasing age, it was suggested that music perception may be improved by practicing listening to music with cochlear implants. Future research should focus on improving subjective enjoyment.

View full abstract

Download PDF (2098K)
Reading auditory attention from eye information

Shimpei YAMAGISHI, Hsin-I LIAO, Yuta SUZUK, Shigeto FURUKAWA

2024Volume 4Issue 1 Article ID: SC-2024-12
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-12

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

In the daily environment where multiple stimuli coexist, the brain selectively and effectively processes the relevant information by directing attention to the object of interest. While much knowledge has been accumulated on how visual attention shifts, which can be easily estimated by eye gaze measurement, there have been insufficient investigations on auditory attention. We have focused on information that can be measured from the eyes and have studied the relationship of auditory attention with changes in pupil size and tiny involuntary saccadic eye movements called microsaccades. Pupil size is known to change due to various factors: the most fundamental response is the pupillary light response, in which the pupil constricts depending on the amount of light input. However, recent studies have revealed that the pupil size changes depending on the luminance of the covertly attended position, even if the actual light input is constant. Microsaccades have also been reported to reflect the direction of covert attention in the visual domain. In this paper, we introduce our previous study [1] that showed the relationship between auditory attention and pupillary responses as well as visual attention.

View full abstract

Download PDF (710K)
Differences in Speech Rate Depending on the Formality of Conversation

－An Analysis using the Corpus of Everyday Japanese Conversation (CEJC)－

Chen LIANG, Kenji KURAKATA

2024Volume 4Issue 1 Article ID: SC-2024-13
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-13

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

The speech rate is a non-linguistic element affecting the comprehensibility of a conversation. Unlike previous studies that focused on measuring the speech rate of read materials, we measured the speech rate in everyday Japanese conversations using a corpus (CEJC). In this study, we analyzed differences in speech rate based on the formality of conversation, in addition to the age and gender of speakers. The findings indicate that informal conversations generally exhibit a faster speech rate than formal ones among the younger generations. However, this difference is less pronounced among the older generations. A potential explanation for this phenomenon could be the presence of an anchor speech rate. In addition, gender differences in speech rate were more noticeable in informal conversations, with men speaking faster than women. In formal conversations, there was minimal systematic change in speech rate across generations, while older generations tended to speak more slowly in informal settings.

View full abstract

Download PDF (1473K)
Vocal Communication in Individuals who Experience Selective Mutism

Yuria TOMA, Miki TOYAMA, Soichiro MATSUDA

2024Volume 4Issue 1 Article ID: SC-2024-14
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-14

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

We conducted an interview survey, a questionnaire survey, and a conversation experiment with those who experience Selective Mutism (SM), to clarify their difficulties in social situations and the characteristics of their speech behavior. Those who experience SM continued to have difficulties with speech even after they no longer met the diagnostic criteria for SM, suggesting that speech difficulties affect social interaction. In the experimental situation, those who experience SM were found to have a longer reaction latency. The long response latency may be a barrier to vocal communication in everyday situations as well. Future work is needed to clarify the environmental factors that influence the speech behavior of those who experience SM.

View full abstract

Download PDF (2254K)
Education to Caregivers of Young Children Who Stutter and Its Effect on Speech Rate

Keiko OCHI, Naomi SAKAI, Kohei KAKUTA

2024Volume 4Issue 1 Article ID: SC-2024-15
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-15

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

In this study, we conducted an educational session for caregivers of young children who stutter, explaining the Demands and Capacities Model, a theory that explains the mechanism for the onset of stuttering, and educating them on how to interact with their young children. The relationship between self-assessment of speech rate before and two weeks after the education and actual recorded speech rate was investigated. The speech rate was based on recordings of daily conversations between young children and their caregivers. The results showed that speech rate decreased significantly after 2 weeks of the lecture, in terms of self-assessment as well as in terms of the observed measurements. At the two-week follow-up, the slower the actual speech rate of the caregiver was, the slower the young child's speech rate was, suggesting the influence of the caregiver's adjustment of the speech rate.

View full abstract

Download PDF (781K)
Speech disfluency and task specificity differences among adult fluent and non-fluent speakers

Daichi IIMURA, Osamu ISHIDA, Shoko MIYAMOTO

2024Volume 4Issue 1 Article ID: SC-2024-16
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-16

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

We examine speech disfluency in 27 adults who stutter and 20 fluent adults across four tasks: oral reading, picture description, conversation, and story retelling. ‘Disfluencies’ are categorized into stuttering-like disfluencies (SLD) and normal disfluencies (NDF). Participants who stutter were stratified based on ‘ratio of disfluency’ (RDF), which helps assess cluttering. Results reveal varying SLD in each group, but all groups show similar amounts of NDF. Sound repetition is predominant in the stuttering group with SLD > 3, potentially indicating cluttering, unlike the block-dominated in SLD ≤ 3 groups. Differences in the primary types of disfluencies in disfluency disorders are thus highlighted.

View full abstract

Download PDF (1349K)
Characteristics of speech disfluency in stuttering and cluttering

Shoko MIYAMOTO

2024Volume 4Issue 1 Article ID: SC-2024-17
Published: January 19, 2024
Released on J-STAGE: March 20, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2024-17

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Stuttering and cluttering are known to be similar fluency disorders, but different disorders, and the significance of differentiating between them has been reported. The importance of differentiating between stuttering and cluttering has been reported. Numerous findings have been accumulated regarding the similarities and differences between the two disorders, and these findings have been reflected in the definition of cluttering. On the other hand, there still seems to be an unresolved debate as to whether or not a language disorder exists in the case of cluttering. In this presentation, I would like to report the progress of my study on the useful differential diagnosis of stuttering and cluttering.

View full abstract

Download PDF (959K)

Register with J-STAGE for free!