JSAI Technical Report, SIG-SLUD

Consideration of speech conversion methods using MelGAN-VC for second language learners

Mori KIYOTADA, Miyoshi YASUO

Article type: SIG paper
Pages 01-04
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Many videos in various languages are posted on video-sharing websites such as YouTube. Watching the videos is promising to be a listening practice for second language learners. However, many of the videos posted on these websites were not produced as listening materials, and some speakers have distinctive accents and other problems making them difficult for learners to understand. For this reason, learners often adjust the playback speed to an easy-to-listen-to speed for them. This research aims to provide an environment in which learners can adjust the accent of the speaker in the video to be more like that of their mother tongue and make it easier to listen to, in order to enable further effects of scaffolding in combination with speed adjustment. We investigated the use of adversarial generative networks (GANs) and other speech conversion methods for this purpose and conducted experiments using MelGAN-VC to convert speech. As a result, it was confirmed that it is difficult to suppress noise to the extent that it does not bother the learners.

View full abstract

Download PDF (690K)
How Can We Explain the Features of Audience Applause?

Iguchi KENTA, Kojima YUGO, Kobayashi YOH, Takai TOMORU, Hirata AYANO, ...

Article type: SIG paper
Pages 05-10
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_05

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Today, COVID-19 has spread all over the world, and situations in which you clap have been increasing. Therefore, clapping, especially clapping as a vehicle of communication, seems interesting as an object of research. In this research, it is clarified what features each instance of clapping made by an audience has, and what reasons can be considered for these features. The words individuality and interactivity were our keywords for this research. The word individuality means the rate at which a person claps their hands without any influence from other people. The word interactivity means the rate at which a person claps their hands with influence from other people.In this research, three clapping events formed by more than 100 people were observed and some common features were found. First, clear spread of clapping was not observed, and clapping might spread due to aural signals. Second, clear patterns are found as audiences stop clapping. Third, the sense of belonging has an effect on clapping. This study is expected to become useful for future communication studies.

View full abstract

Download PDF (2798K)
Prototyping Dialogue System for Collecting Personal Attribute Tags for Supporting Right Person in Right Place

Hashimoto EKAI, Shun SHIRAMATSU, Sora MATSUMOTO, Hidekazu AOSHIMA

Article type: SIG paper
Pages 11-14
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_11

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, there has been an increasing movement away from traditional topdown organizational management toward flat organizational management in which members can perform their work more autonomously. However, compared to hierarchical organizations, overall management becomes more difficult when individual autonomy is respected. Therefore, this study aims to develop a matching mechanism that can manage an organization with the right people in the right places while respecting the autonomy of each individual. In this paper, specifically, we develop a prototype dialogue system to estimate user skills and interests and to collect personal attribute tags on Slack. We also conducted evaluation experiments to verify the tagging performance. As a result, although skill tags can be collected, hope tags cannot be sufficiently collected by our prototype system. As a future work, we are planning to generate appropriate questions to collect users' hope tags.

View full abstract

Download PDF (327K)
A consideration on "Hanashi-ai" in community development and transformation of the participants

Mizukami ETSUO, Murata KAZUYO, Morimoto IKUYO

Article type: SIG paper
Pages 15-18
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_15

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (479K)
[title in Japanese]

Ito TAKAYUKI

Article type: SIG paper
Pages 19
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_19

CONFERENCE PROCEEDINGS FREE ACCESS

Download PDF (73K)
Prototype of Application for Supporting Interview from Needy Persons and study of the construction of Needy Factor Ontology

Sakai YU, Shiramatsu SHUN, Oda MOTOKI, Onochi MITSUHIRO

Article type: SIG paper
Pages 20-23
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_20

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (897K)
Everyday Conversation Corpus as an Archive of Society

Usuda YASUYUKI

Article type: SIG paper
Pages 24-29
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_24

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

This study demonstrates that everyday conversation corpus can be seen as an archive of the society through a series of analysis of conversation in the Corpus of Everyday Japanese Conversation (CEJC). CEJC is a corpus constructed by the National Institute for Japanese Language and Linguistics (NINJAL) that consists of 200 hours of natural conversation recorded from 2016 to 2020. At the period, our everyday life is significantly changed due to COVID-19, which is also the case for the conversation in the corpus. To show the availability of the corpus as an archive, a few excerpts of office meetings which are recorded at the period of the beginning of COVID-19 in a dental clinic are analyzed. In the excerpts participants are talking about prevention of infection. Through the analysis, it is shown that social concerns are dealt with in relation to the position of the participants. It can be said that the data in the corpus is substantial enough to be analysed in detail, which is to be mentioned as a strength of everyday conversation corpus as archives of society.

View full abstract

Download PDF (938K)
How can we utilize ChatGPT and large-scale language models for consensus-building and crowd co-creation?

Shiramatsu SHUN, Suenaga AYAHA, Yoshimura YUKI, Ito TAKAYUKI

Article type: SIG paper
Pages 30-37
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_30

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

ChatGPT and GPT-3.5, released by OpenAI in 2022, have become a social phenomenon, also known as the "democratization of AI," as they have been widely adopted by people unrelated to programming. Despite the problem that the responses generated by these large-scale language models have a high probability of containing false or fake information, they are capable of generating logical responses with respect to a very wide range of domains. If the possibility of including fakes is taken into account, such language models can be utilized to support ideas in the divergent phases of discussions and in idea workshops for solving social issues. There is also a possibility that the large-scale language model can be used to generate questions and structure discussions as facilitators do. In this paper, we introduce an evaluation experiment of a prototype using GPT-3 and dialogue examples about solving social issues on ChatGPT, and consider the possibility of its utilization.

View full abstract

Download PDF (2485K)
Interpreting Split-Characters with Contextual Information

Takagi KENTO, Ryo INUI, Tsuyoshi YAMAMURA

Article type: SIG paper
Pages 38-43
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_38

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

SNS posts are an effective information because they contain a wide variety of postings. However, posts on SNS contain unique expressions which are different from those used in newspapers and other media. Therefore, it is difficult to analyze them using traditional natural language processing, and special processing is required. In this study, we focus on Split-Characters among the unique expressions. Split-Characters refer to characters in which one character is divided into multiple characters. In the previous study, OCR was used to visually process Split-Characters. However, because OCR is a method for identifying Split-Characters by character recognition, it does not use contextual information and does not consider the propriety of the corrected sentence. In this study, we propose methods for Interpreting Split-Characters using contextual information. Three models with contextual information are used: N-gram, RNN, and BERT. We propose methods to interpret Split-Characters using these models, and verify whether the proposed methods can convert SplitCharacters into correct ones.

View full abstract

Download PDF (360K)
Response Generation Model with Interpolability and Controllability of the Person Embedding Space

Yasukawa HIROKI, Masahiro MIZUKAMI, Seitaro SHINAGAWA, Hiroaki SUGIYAM ...

Article type: SIG paper
Pages 44-49
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_44

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

For a response generation model that reflects the characteristics (e.g., a person's interests and preferences) to work practically, it is required that the model has acquired a person embedding space that can be interpolated to enable response generation by a speaker who corresponds to an intermediate speaker between different persons and that the person embedding is easy to control. In this study, we trained the model using a large amount of dialogue data with user identifiers, which are suitable for acquiring an interpolable person embedding space, and dialogue data with persona sentences (sentences describing the characteristics of a person), which are highly controllable for person representation, by mixing the two types of dialogue data. We propose a dialogue model that can generate responses via this person embedding. To demonstrate the effectiveness of the proposed method, we compared it with a conventional response generation model that does not explicitly model persona embedding and evaluated the interpolability and controllability of the persona embedding obtained by the proposed method.

View full abstract

Download PDF (322K)
Response Generation for Long-term Conversation via Multi-Task Learning with IR-based Response Restoration

Takasaki MEGURU, Naoki YOSHINAGA, Masashi TOYODA

Article type: SIG paper
Pages 50-55
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_50

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

When a dialogue system has a long-term conversation with a person, it is desirable to generate responses taking past dialogue sessions into account. However, the conversation logs used for training dialogue systems do not necessarily contain many responses considering the past dialogue context. Therefore, it is difficult to generate responses that fully respect the past dialogue context if the dialogue system is only trained by concatenating the past dialogue context with the current context. In this paper, we propose a multi-task learning method for response generation to force the dialogue system to consider the past context adequately. The auxiliary self-supervised task is to generate the system-side utterance included in the most similar past dialogue context to the current context. In the experiment, we trained our proposed models on the Mulit-session Twitter Dialogue Dataset and verified the effect of our data augmentation methods.

View full abstract

Download PDF (417K)
An analysis for withdrawing and its process in Hanashi-ai

Mizukami ETSUO

Article type: SIG paper
Pages 56-61
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_56

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (548K)
The apology expression "gomen" used to preface utterances about what the speaker is doing now

Yang SEUNGKYOO

Article type: SIG paper
Pages 62-67
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_62

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (845K)
Analysis of Phonemes of Linear Hand Movements in Japanese Sign Language Using 3D Motion Capture

Nakamoto SEIYA, Horiuchi YASUO, Hara DAISUKE, Kuroiwa SHINGO

Article type: SIG paper
Pages 68-73
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_68

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

We analyze the phonemes of linear hand movements in Japanese Sign Language by using data measured by high-precision optical motion capture. The results of the analysis show that for words with phonemes of movements in six directions (Up, Down, Outward, Toward, Right, and left), the trajectories of the movements fall within a certain range, suggesting that the linguistically defined movement phonemes can be distinguished as movements in three-dimensional space.

View full abstract

Download PDF (786K)
Influences of Modality and Social Relationships on Building Common Ground in Remote Dialogues

Saito KOKI, Furuya YUKI, Ogura KOSUKE, Mitsuda KOH, Higashinaka RYUICH ...

Article type: SIG paper
Pages 74-79
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_74

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Building common ground in dialogue is important for effective communication. Our previous study demonstrated that rich modality and close social relationships between workers can facilitate building common ground in a remote collaborative task. In this study, we analyzed the factors that contribute to building common ground with the dialogue data collected in our previous study. The results showed that the switching pause was significantly longer in the condition where the modality was rich and the workers were close than in the other conditions. When the switching pause was longer, one worker tended to respond more thoughtfully to the other worker's questions. These findings suggest that the amount of utterances containing concrete information regarding the collaborative task is a key factor in building common ground smoothly.

View full abstract

Download PDF (3300K)
Development of a 3D-visible Multimodal Attentive Listening System that Utilizes Posture Mirroring Effect

Mizutani RINTARO, Suzuki HISASHI

Article type: SIG paper
Pages 80-85
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_80

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Attentive listening is an effective communication technique for rapport building. Our project developed a 3D-visible multimodal attentive listening system. The proposed system projects a 3D avatar on a naked-eye stereoscopic display in order to effectively communicate non-verbal information such as posture mirroring effects. The proposed system has spoken communication based on listening dialogues, such as backchannels, empathy, repeats, modality-based responses, and mutual questions. Furthermore, the proposed system has multiple means of communication consisting of nonverbal information such as facial expressions, eye gaze, blinking, nodding, and posture mirroring. In this article, our project experimented with 60 participants, consisting of university undergraduate and graduate students. The participants freely talked for three minutes about hobbies, favorite/disliked foods, future goals, worries, how they spend their holidays, childhood memories, club activities, part-time jobs, failures, and other topics in a one-on-one room. The participants then responded to a 22-item sensitivity evaluation. Our results show that the avatar's listening attitude, such as the way she listens and shows empathy, was highly evaluated by the participants. Moreover, the majority of participants felt that their avatars were friendly and that they wanted to talk to them.

View full abstract

Download PDF (3103K)
Investigation of Features Representing Temporal Variation of Prosody in Emotional Speech

Mitui RIKUYA, Horiuchi YASUO, Kuroiwa SHINGO

Article type: SIG paper
Pages 86-91
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_86

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

We propose a set of features that can describe the degree of variation in the prosody of voice pitch, speed, and timbre for pairs of neutral and emotional speech with the same utterance content. These features can be measured in time series and can also be used as features for the entire utterance when averaged over the entire utterance. Regression analysis of these three features and the results of emotional voice intensity ratings showed that each feature is valid for expressing the intensity of prosody. Some examples of time series analysis of prosody using these features are also shown.

View full abstract

Download PDF (945K)
Effects of reactions generated by a virtual world on game players under laughing/non-laughing conditions.

Fukuda MIKITO, Arimoto YOSHIKO

Article type: SIG paper
Pages 92-97
Published: February 27, 2023
Released on J-STAGE: February 27, 2023

DOIhttps://doi.org/10.11517/jsaislud.97.0_92

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

This report examined the psychological and physiological effects of game events generated in response to a player's laughter by measuring player's heart rate (HR), skin conductance level (SCL), zygomaticus major (ZYG), and corrugator supercilii (COR) to elucidate whether the virtual world responding to player's laughter more attracts them. Participants played two conditional virtual online games, and their HR, SCL, ZYG and COR were recorded during the game. The experiment consisted of two conditions, i.e. laugh event condition and non_laugh event condition. In the laugh event condition, the system responds to the player's laughter with the game event. In the non_laugh event condition, the system presents game events when the player is not laughing. A three-way analysis of variance was performed using HR, SCL, ZYG and COR signals to test the hypothesis that there is time-series variation in each physiological response between event presentation (laugh/non_laugh) and between event types (advantageous/disadvantageous). As a result, the presentation of the event to the player's laughter decreased HR, significantly activated SCL, and significantry deactivated ZYG. On the other hand, the presentation of the event to non-laughing players decreased HR, significantry activated ZYG and COR. This result suggested that the presentation of game events makes laughing players more emotionally aroused and suppresses their pleasant emotions, while those affect non-laughing players differently.

View full abstract

Download PDF (2748K)

Register with J-STAGE for free!