Proceedings of the Technical Committee on Speech Communication
Online ISSN : 2758-2744
Volume 3, Issue 2
Displaying 1-7 of 7 articles from this issue
  • Fukushima Yuki, Tajima Motoharu, Hironori Takemoto
    2023Volume 3Issue 2 Article ID: SC-2023-7
    Published: February 24, 2023
    Released on J-STAGE: February 15, 2024
    RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

    The finite-difference time-domain method can calculate acoustic properties of the geometrical model of the nasal and paranasal cavities extracted from CT data. The calculation results are validated by acoustic measurements of the physical model constructed from the same CT data. The measurements, however, have been unsuccessful. This is because measurement signals input through the nostrils are not observed at the glottis with a sufficient signal-to-noise ratio. To overcome this problem, an exponential horn which can supply measurement signals with high amplitude and a physical model with thick wall to depress the wall vibration were introduced in the present study. As a result, a transfer function was successfully measured, to evaluate the calculated one. The evaluation implied that fine structure of the paranasal cavities could not be reproduced with sufficient accuracy in the physical model.

    Download PDF (1535K)
  • Hideki KAWAHARA, Ken-Ichi SAKAKIBARA, Nao HODOSHIMA, Hideki BANNO, ...
    2023Volume 3Issue 2 Article ID: SC-2023-8
    Published: February 24, 2023
    Released on J-STAGE: February 15, 2024
    RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

    Physical instantiation of speech communication depends on the acoustic environment. The acoustic environment not only modifies listening behavior but also modifies speech production behavior and makes speech attributes di!erent. Establishing a precisely controllable acoustic environment simulator is necessary to investigate such e!ects experimentally. Enabling flexible, interactive manipulation of such simulation environment consisting of real-time signal processing will help researchers to e"ciently acquire tacit and deep knowledge of speech communication and investigate and quantify their e!ects by experiments. This paper introduces the preliminary investigations and implementation of such tools to stimulate discussions.

    Download PDF (3856K)
  • -Results of a preliminary survey by native Japanese speakers-
    Lae Lae Htun, Tetsuya SHIMAMURA, Mee SONU
    2023Volume 3Issue 2 Article ID: SC-2023-9
    Published: February 24, 2023
    Released on J-STAGE: February 15, 2024
    RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

    Our aim was to understand the scientific features of emotional expressions in Japanese by native and non-native speakers. Specifically, this study focused on how non-native speakers recognize the emotional expression of native speakers and vice versa. As a first attempt, the present study analyzed various emotional expressions using the one-word utterance “n” by young female native Japanese speakers. Based on the preliminary survey, the results of the analysis showed three types of F0 dynamic patterns in “n.” Positive and agreeable emotions of “n” exhibited the “Rise and Fall” pattern, negative emotions of “n”exhibited the “Gradual Fall” pattern, whereas doubtful emotions of “n” exhibited the “Rise” pattern. The minimum frequency of F0 did not considerably differ in all these emotions, whereas the maximum frequency of F0 was high for all emotions except negative emotions. These results suggest that the F0 movement may be related to emotional expression, that is, positive and negative.

    Download PDF (1123K)
  • Matsuri Yasuda, Tatsuya Kitamura
    2023Volume 3Issue 2 Article ID: SC-2023-10
    Published: February 24, 2023
    Released on J-STAGE: February 15, 2024
    RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

    Expressions associated with the voice quality of female cartoon voice actors were extracted to reduce mismatch between their voice quality and cartoon characters of animation films and video games. We first collected expressions describing the voice quality, and then investigated their understandability, synonymity, and similarity. Five expression pairs were extracted through a cluster analysis.

    Download PDF (1404K)
  • Sodeya SHINTARO, Motoyuki SUZUKI
    2023Volume 3Issue 2 Article ID: SC-2023-11
    Published: February 24, 2023
    Released on J-STAGE: February 15, 2024
    RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

    In recent years,methods using GPT have attracted attention in the research field of chat dialogue systems.In this method,responses can be automatically generated by inputting situations and context.into GPT,but it is not always possible to generate natural responses.This may cause errors such as generating silly remarks.Therefore,methods have been proposed in which multiple response candidate sentences are generated from GPT and an appropriate response sentence is selected from among these using a selection model.In this paper,we used SentenceBERT as a feature extractor and re-learned it with a small amount of data to create a model that is consistent with the context and the speaker,and evaluated its performance.

    Download PDF (1296K)
  • Naoki KANAZAWA, Motoyuki SUZUKI
    2023Volume 3Issue 2 Article ID: SC-2023-12
    Published: February 24, 2023
    Released on J-STAGE: February 15, 2024
    RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

    In recent years, there have been several studies on speech generation from lip video images. Many conventional methods use DNN models based on CNNs or RNNs to generate speech waveforms. In such methods, the model learns speaker specific features such as skin color and moles, and the performance degrades when data from speakers other than the training speaker is used as input. Therefore, we proposed a method to remove speaker-specific features from the input features in order to generate speech waveforms with high performance for any speaker. In this paper, we generated speech waveforms using the proposed input features and evaluated them using any STOI. As a result, the performance of the proposed method was worse than that of the lip video input method, but we confirmed the effectiveness of the proposed method in suppressing the degradation caused by differences in speakers.

    Download PDF (1707K)
  • Nagisa KATO, Ayako SHIROSE
    2023Volume 3Issue 2 Article ID: SC-2023-13
    Published: February 24, 2023
    Released on J-STAGE: February 15, 2024
    RESEARCH REPORT / TECHNICAL REPORT RESTRICTED ACCESS

    Among the acts of reading texts, "initial reading" is the first encounter with a text. It is assumed that the method of initial reading, which is how a text is read at the beginning of a unit, especially requires consideration of its effect and purpose. Therefore, in this study, in order to understand which method of initial reading is common among oral and silent reading in actual educational settings, and how the method of initial reading is selected and judged, we carried out a survey of textbook instruction manuals for teachers and a questionnaire survey of Japanese language teachers.

    Download PDF (1022K)
feedback
Top