Proceedings of the Technical Committee on Speech Communication

The automatic classification of the severity of depression by voice sound

Seiji MURANAKA, Yuko SHIGEEDA, So SUGITA, Masaya ITO

2023Volume 3Issue 4 Article ID: SC-2023-18
Published: September 14, 2023
Released on J-STAGE: February 15, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2023-18

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

In this study, we attempted to analyze the speech of patients undergoing the GRID-HAMD to see if they could be classified according to the severity of their depressive symptoms. Clinical psychology has been exploring assessment methods for mental disorders. Some speech features could be associated with severe depression symptoms, but it is still unveiled, especially in Japanese. In this study, we constructed a classification model of GRID-HAMD scores at each time point using 97 patients' audio from 4-time points of the assessment interview. The results confirmed that the Accuracy and F1-score are currently 0.52 and 0.49, respectively. Moreover, it was confirmed that spectral contrast and MFCC contribute to the classification.

View full abstract

Download PDF (1147K)
Development of a fatigue estimation system using voice and image recognition

Kaiki NISHIYAMA, Kaiki NISHIYAMA

2023Volume 3Issue 4 Article ID: SC-2023-19
Published: September 14, 2023
Released on J-STAGE: February 15, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2023-19

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Fatigue is an inevitable part of daily life. In addition, the use of electronic devices for long periods of time is increasing due to changes in lifestyles, starting with the COVID-19 in 2020. These factors are likely to further cause fatigue in daily life and affect our lives. Therefore, we thought that a system to estimate the degree of fatigue might be useful, and focused on facial expressions and utterances, which are easily characterized by fatigue. In this study, we aim to develop a system that estimates the degree of physical or mental fatigue of a user using speech and image recognition technologies, and displays the results of the estimation to encourage the user to take a break if necessary. In this report, we present an overview of the proposed system and the results of preliminary identification experiments using vowels and phoneme-balanced sentence utterances during fatigue.

View full abstract

Download PDF (3207K)
Kalliope: Power and Spectrum Estimation of Periodic Signals Invariant to Short-time Window Positions

Shigeki SAGAYAMA

2023Volume 3Issue 4 Article ID: SC-2023-20
Published: September 14, 2023
Released on J-STAGE: February 15, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2023-20

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Fundamental problems in speech analysis is discussed along with minimum requirements for voiced speech analysis. Short-time spectral analysis of periodic signals using period-multiple long von Hann or Hamming window is shown to yield the true values of power and harmonic power (i.e., peak values of the spectral fine structure) invariant to the relative positional relationship between the speech waveform. Arbitrary effective window length is also proposed. A preliminary discussion is given on the method of estimating the filter characteristics in the source filter model from a single harmonic spectrum and the learning method from multiple frames.

View full abstract

Download PDF (6070K)
Speech Recognition and Spatial Release from Masking for Those with Unilateral Hearing Loss

A Comparison of Hearing Under Anechoic and Reverberant Environments

Shinya TSUJI, Takayuki ARAI

2023Volume 3Issue 4 Article ID: SC-2023-21
Published: September 14, 2023
Released on J-STAGE: February 15, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2023-21

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

In this study, speech reception thresholds (SRT) and the degree of spatial release from masking (SRM) under anechoic and reverberant environments were measured under three hearing conditions namely, unilateral hearing loss (UHL), monaural normal hearing (MNH), and binaural normal hearing (BNH). Reverberation was estimated as a factor of impairing speech intelligibility for MNH and UHL, whereas SRT in BNH was not significantly affected. MNH and UHL showed poorer degree of SRM than BNH and the degree of SRM was negative when the target sound was located on the side of the impaired ear. However, participants with UHL were released from energetic masking under reverberation and the degree of SRM was improved. Results suggest that the hearing of people with UHL could be improved by the adaptive contribution to monaural cues.

View full abstract

Download PDF (1484K)
[title in Japanese]

Takuya ASAI, Hideaki KIKUCHI, Nobuyuki JINCHO

2023Volume 3Issue 4 Article ID: SC-2023-22
Published: September 14, 2023
Released on J-STAGE: February 15, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2023-22

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

In this study, an impression estimator for inside sales speech is developed. To control the linguistic information, simulated business meeting speech was collected. The collected speeches were evaluated by crowdsourcing. Observation of the impression evaluations revealed that more than a certain number of evaluators were engaged in called “Satisfice”. Therefore, appropriate evaluators were selected by using the evaluation time, agreement rate, and standard deviation. As a result, it is confirmed that the accuracy of the impression estimation model is improved when the evaluators are selected by using the evaluation time.

View full abstract

Download PDF (1442K)
Prosodic Attitude Recognition for Spoken Dialogue Systems: On the Use of the Corpus of Everyday Japanese Conversation

Kouki MIYAZAWA, Yoshinao SATO

2023Volume 3Issue 4 Article ID: SC-2023-23
Published: September 14, 2023
Released on J-STAGE: February 15, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2023-23

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Processing paralinguistic messages is indispensable for spoken dialogue systems to facilitate natural interaction with humans. Toward this goal, we develop a prosodic attitude recognition (PAR) model to recognize four essential attitudes for deciding responses (i.e., agreement, disagreement, question, and stalling) from users'speech. In this study, we included an acted speech corpus for PAR we built and the corpus of everyday Japanese conversation (CEJC) in the data for training and evaluation. We identified patterns of paralinguistic features that are characteristic of everyday conversation and rarely appear in acted speech. We also demonstrated that our model can recognize such prosodic attitudes more accurately than the model trained solely using the acted speech corpus.

View full abstract

Download PDF (1546K)
Noise Effect on the Correlation between Fundamental Frequency (F0)

Difference and Speaker Identification Accuracy

Ryohei SUZUKI, Kanae AMINO, Takayuki ARAI

2023Volume 3Issue 4 Article ID: SC-2023-24
Published: September 14, 2023
Released on J-STAGE: February 15, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2023-24

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

In this study, 30 native Japanese speakers were examined using two types of noise with three levels of SNR (SNR = ∞,0 dB,−10 dB), and five unknown speakers were asked to di!erentiate between two types of speech. Based on the experiment results, a linear single regression analysis was conducted to analyze the relationship between the fundamental frequency (F0) di!erence between stimulus pairs and the identification rate in the presence and absence of noise, and the e!ect of the degree of noise. The parameters estimated by the linear single regression analysis confirmed the possibility that F0 contributes more to speaker recognition with noise than without noise, and that the change in the contribution of F0 to speaker recognition with the presence or absence of noise may be small.

View full abstract

Download PDF (1516K)
Revisiting protocol for making speech materials reusable

Preparation, recording, and Analysis

Hideki KAWAHARA, Ken-Ichi SAKAKIBARA, Mitsunori MIZUMACHI, Kohei YA ...

2023Volume 3Issue 4 Article ID: SC-2023-25
Published: September 14, 2023
Released on J-STAGE: February 15, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2023-25

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

We are studying to establish the recommended protocols for making speech materials reusable. We pre sented a protocol for preparation, recording, and analysis of speech materials in 2019. The technological advances and investigations in practical situations since then, we found several recommendations are outdated, and worse,erroneous. We describe those advances and issues and discuss how to revise recommended protocols and how to introduce them to related communities.

View full abstract

Download PDF (5648K)
To produce absolute pitch without perfect pitch

By recalling a melody which lies deep

Toru SUGIMOTO, Aiko TSUNEKAWA, Yuko YAMAJI, Kana NOGUCHI, Shinju TSUDA ...

2023Volume 3Issue 4 Article ID: SC-2023-26
Published: September 14, 2023
Released on J-STAGE: February 15, 2024

DOIhttps://doi.org/10.60274/asjsc.SC-2023-26

RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

Show abstractHide abstract

Without perfect pitch, reproducing accurate C or D can be challenging. However, if melodies which lie deep remain original, it becomes possible to reproduce their keys through recalling them. So, we examined whether recalling melodies in memory could reproduce the original key or not. As a result, it was found that melodies not particularly liked but stuck in the mind or instrument sounds might have a higher chance of being reproduced unchanged compared to songs often remembered, sung, or played. Furthermore, the ability to mentally transpose and enjoy music without perfect pitch reaffirmed the greatness of the brain's capabilities.

View full abstract

Download PDF (1252K)

Register with J-STAGE for free!