-
Shiki SATO, Yuta SASAKI, Shinji IWATA, Takato YAMAZAKI, Masato KOMURO, ...
Article type: SIG paper
Pages
01-08
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
Building upon the success of the six previous Dialogue System Live Competitions, we organized the seventh edition, "Dialogue System Live Competition 7." This competition series aims to highlight the challenges and limitations of human-computer dialogue in a live event setting. Similar to the prior edition, our focus was on multimodal dialogue systems. The competition featured two tracks: the "Situation Track" and the "Task Track." The Situation Track aims to develop human-like dialogue systems for specific scenarios, while the Task Track focuses on creating dialogue systems capable of achieving task completion in complex and advanced problems. In the preliminary round, 14 teams competed in the Situation Track and 8 teams in the Task Track. This paper provides an overview of the event and reports the results from the preliminary round. The final round is scheduled to be held as a live event at the 103rd Meeting of the Special Interest Group on Spoken Language Understanding and Dialogue Processing.
View full abstract
-
Masako OGATA, Yuri NAKAMURA, Shio ARIMA, Hirofumi KIKUCHI
Article type: SIG paper
Pages
09-12
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This paper reports on the control mechanisms and dialogue strategies adopted in the dialogue system developed for the Dialogue System Live Competition 7 (Situation Track). In this competition, participants were required to develop a system that listens to users' complaints and supports their decision-making. To effectively engage with users' grievances, it is essential to encourage self-disclosure. Therefore, in this study, we designed the system to exhibit a character personality commonly associated with the so-called "gal" archetype, which actively expresses positive and empathetic interest toward users. Furthermore, to assist users in making decisions, we incorporated a role-playing strategy aimed at fostering awareness and understanding of emotions. As a result, our system achieved sixth place in the competition's preliminary round.
View full abstract
-
Chikara HANAKAWA, Kota YAMAMOTO, Sota KOBORI, Shinya FUJIE
Article type: SIG paper
Pages
13-18
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
In this report, we discuss the design of prompts for a large language model in a spoken dialogue system that listens to complaints and supports decision-making. The system explicitly divides the dialogue into three phases: listening to complaints, supporting decision-making, and casual conversation. Information relevant to each phase is structured into slots, ensuring consistency in system behavior and facilitating appropriate phase transitions. Moreover, the system continuously estimates the underlying emotions behind the speaker's utterances, rather than merely capturing their surface meaning, and generates responses accordingly. This enhances the naturalness of empathetic and listening behaviors, such as expressing sympathy and asking relevant questions. Additionally, to improve the coherence of system behavior, we introduce both general considerations that apply across all phases and phase-specific guidelines, reducing subtle unnaturalness and discomfort. We demonstrate the effectiveness of this design by presenting actual prompts and dialogue examples.
View full abstract
-
Koushiro NUMADA, Yusuke OGAWA, Takuto HIRAKAWA, Keisuke KAMEYAMA, Kent ...
Article type: SIG paper
Pages
19-23
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
We describe the prompt used in the system developed for the situation track at the 7th Dialogue System Competition. We observed that a prompt containing only information about the situation specified from the organizer tends to lead to superficial conversation without engaging in specific consultations. To address this issue, we designed a prompt to ask questions to clarify the user's intention to keep the relationship with his/her friend and to elicit concrete personal episodes while expressing empathy. We embedded one example of person-to-person dialogue based on the situations into the prompt. In addition, by annotating the agent's voice style, emotions, and gestures for each utterance in the example dialogue, we tried to achieve multi-modal behaviors that keep consistent with the system's responses.
View full abstract
-
Jundai SUZUKI, Junghoon LEE, Shizuya OSAWA, Eisaku MAEDA
Article type: SIG paper
Pages
24-28
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Hiroki MATSUOKA, Shigehisa FUJITA, Atsushi HORIGUCHI, Kazuki MORI, Sac ...
Article type: SIG paper
Pages
29-33
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
The dysfunctional thought record method, widely used in cognitive behavioral therapy (CBT), is a technique that involves describing thoughts and emotions related to a specific event, identifying cognitive distortions within them, and guiding individuals toward more realistic thinking. This approach is particularly expected to be useful in addressing difficult situations in interpersonal relationships. In this study, we (Team Careco) developed a multimodal dialogue system incorporating the thought record method in collaboration with users of disability welfare service facilities and participated in the preliminary round of the Situation Track at Dialogue System Live Competition 7. Our system is designed to analyze user concerns using GPT-4o, a generative AI, and automatically present questions aligned with the thought record method. Furthermore, it assesses the potential presence of cognitive distortions in users' responses and aims to support their decision-making. In the preliminary round, our system ranked 7th out of 14 teams.
View full abstract
-
Taiga MORI
Article type: SIG paper
Pages
34-39
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This paper describes our system, which is designed based on the concept of membership categories in conversation analysis. The activity of consultation is carried out by participants who assume the two categories of "advisor" and "advisee." In many cases, the advisor is an expert, while the advisee is a non-expert, and the asymmetry in their knowledge defines their respective categories. However, in consultations on general topics, there is not necessarily a knowledge gap between the participants. To address this, the proposed system refers to the categories of "those who have experienced the problem" and "those currently experiencing the problem" by narrating the experience of dealing with a problematic friend, thereby establishing an asymmetric relationship and providing advice.
View full abstract
-
Moe NAGAO, Koichiro TERAO, Yuhi OGA, Naoto IWAHASHI, Yuta SASAKI, Taka ...
Article type: SIG paper
Pages
40-45
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Sota KOBORI, Chikara HANAKAWA, Setsu ITO, Shinya FUJIE
Article type: SIG paper
Pages
46-49
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
For the control strategy of a spoken dialogue system, we propose a method based on state management using multiple evaluation metrics and turn management with utterance completion prediction. The dialogue management is divided into two phases: condition confirmation and tourist spot explanation, each controlled by different evaluation metrics. The condition confirmation phase employs three metrics to gradually refine user requirements: condition ambiguity indicating search condition clarity, concretization difficulty representing requirement specification ease, and search condition sufficiency showing tourist spot search feasibility. The explanation phase provides tourist spot information while dynamically evaluating three metrics: rejection level indicating negative response strength, question detection determining question presence, and peripheral search necessity assessing the need for surrounding facility information. The system transitions between phases based on these evaluation metrics, flexibly updating search conditions in response to user reactions. The turn management system utilizes prediction based on utterance completion and context consideration, enabling responses at appropriate timing based on sentence-final expressions and utterance content. These implementations maintain natural dialogue flow while achieving effective tourist spot recommendations through systematic evaluation and control.
View full abstract
-
Keisuke KAMEYAMA, Syunsuke FUKAZAWA, Kenta YAMAMOTO, Kazunori KOMATANI
Article type: SIG paper
Pages
50-55
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This paper describes our system submitted for the task track of the Dialogue System Live Competition 7. In task-oriented dialogue systems, dialogue management with LLM sometimes fails to achievetask completion. For this problem, we combine rule-based dialogue management by DialBB and response generation by LLM. This enables the system to reliably determine tourist attraction while maintaining flexibility in conversation. In addition, we use a VAD instead of the Remdis built-in VAP, because unnecessary interruptions by the system occurred while the user was speaking. In the preliminary round, our system received positive feedback from the users when the dialogue was successfully completed without trouble.
View full abstract
-
Nozomi KIMATA, Qiujie WANG, Yoshiki TOMITA, Haruki HATAKEYAMA, Eisaku ...
Article type: SIG paper
Pages
56-61
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
The system is composed of three modules: summarization, transition, and searching.In the summarization module, a large language model (LLM) summarizes past dialogue contentto generate the necessary text for the next response. This enables the LLM to comprehensivelyunderstand the dialogue history, leading to more coherent and contextually relevant responses. Thetransition module allows the LLM to reference the summary and determine whether the objectiveof each phase has been achieved. If necessary, it facilitates the transition to the next phase.By employing appropriate prompts for each phase, the system ensures step-by-step, goal-orientedresponse generation. The search module is activated when suggesting tourist destinations. Itestimates user preferences based on the summarized information and searches for destinations thatalign with those preferences. By integrating these three modules, the system maintains dialogueconsistency while delivering personalized tourist recommendations that effectively meet user needs.
View full abstract
-
Tamotsu MIYAMA, Shogo OKADA
Article type: SIG paper
Pages
62-67
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This paper reports on the travel guidance dialogue system developed for the Task Track of Dialogue System Live Competition 7. This system is based on Remdis and MMDAgent-EX and is designed to assist users in selecting an optimal travel destination by providing diverse candidate sightseeing locations. To manage the dialogue flow, the system utilizes a state transition diagram, dynamically generating and assigning prompts to effectively control interactions. For selecting candidate sightseeing locations, it employs GPT-4o fine-tuned with supervised learning using sightseeing data and applies the Chain-of-Thought (CoT) method to extract relevant locations. Additionally, the system incorporates Travel Viewer, which displays a standard set of 45 sightseeing images and maps on the screen, along with their descriptions, to support users in making informed travel decisions. Furthermore, this paper analyzes user evaluations from the preliminary round of Dialogue System Live Competition 7 and explores the design of an optimal travel guidance chatbot.
View full abstract
-
Chihiro ASADA, Mika ONUKI, Yuto GOTO, Masaki NOSE
Article type: SIG paper
Pages
68-73
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This paper presents our spoken dialogue system developed for the 7th Dialogue System Live Competition. To achieve the competition tasks about travel planing while providing with a better user experience, we designed the system to exhibits evidence that matches user preferences using retrieval-augmented generation in addition to various gestures and facial expressions. The system largely consists of the following three blocks. The first block determines the next dialogue state and tasks based on the evaluation results in order to enable the system to return flexible responses to various topics. The second incorporates possible responses into prompts and then generates convincing proposals with supporting images and maps. The last generates expressive turn-taking and behavior of the virtual agent such as facial expressions, gestures and backchannels. As a result, we placed second in the preliminary round with high rate of evaluation.
View full abstract
-
Yuhi OGA, Koichiro TERAO, Moe NAGAO, Naoto IWAHASHI, Yuta SASAKI, Taka ...
Article type: SIG paper
Pages
74-79
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Dong WANG
Article type: SIG paper
Pages
80-85
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This study examines the potential applications of large language models (LLMs) in language education. To evaluate their ability to identify the compatibility between grammatical items and example sentences, we designed a task and conducted experiments. Using multiple LLMs, we compared their performance based on accuracy, false negative rate (FN rate), false positive rate (FP rate), and Balanced Score. Additionally, we confirmed that synthetic data could serve as a practical alternative. Future research should focus on developing high-quality synthetic data generation methods and expanding their applicability. The findings of this study are expected to contribute to the establishment of benchmarks for evaluating the grammatical competence of LLMs in natural language.
View full abstract
-
Yoshito NAKAMURA, Mehmood FAISAL, Sakriani SAKTI
Article type: SIG paper
Pages
86-91
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
The usage and study of conversational robots are expanding across various social domains. Additionally, modern robots are multilingual, enabling interaction with diverse groups. However, when a robot's secondary language isn't fully comprehended, individuals may feel ignored, rejected, or excluded, a phenomenon known as ``ostracism." Although this has been studied in multilingual and code-switching contexts, it hasn't been specifically examined in human-robot interactions. Thus, investigating the psychological impact of robots' code-switching on users is crucial. This study explores how Japanese-English code-switching by conversational robots influences users' ostracism and SoBA (Sense of Being Attended to) scales. Specifically, this study also examined the differences in the impact of Japanese-English code-switching on men and women. The goal is to contribute to the development of more effective and inclusive human-robot interaction systems.
View full abstract
-
Anjyu TANAKA, Toshitake KOMAZAKI
Article type: SIG paper
Pages
92-95
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This study examines how failures in robot speech affect human acceptance of robots. The analysis focuses on conversations between humans and robots that engage in casual conversation, rather than performing specific tasks like industrial robots. The robot used in this study combines M5Stack, a speech synthesis server, a speech recognition server, and Chat-GPT. The method involves measuring acceptance of the robot through a questionnaire survey before the conversation. Then, a robot-initiated conversation lasting about 15 minutes is conducted, during which intentional mishearing by the robot is introduced. After the conversation, acceptance of the robot is measured again through a questionnaire survey. The study integrates the changes in acceptance measured by the questionnaire with the analysis of human actions towards the robot during the conversation, focusing mainly on speech, gaze, and facial orientation.
View full abstract
-
Hinako KIZAWA, Yoshiko ARIMOTO
Article type: SIG paper
Pages
96-101
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Taisei MAYUZUMI, Yoshiko ARIMOTO
Article type: SIG paper
Pages
102-107
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Mei SAKAUE
Article type: SIG paper
Pages
108-113
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Yasuyuki USUDA, Arata WATANABE
Article type: SIG paper
Pages
114-120
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This study examines the difference of gaze distribution between face-to-face and online conversation focusing on gaze destination and distribution timing. Many studies have shown interest in gaze as it plays important roles in forecasting turn change and indicatingattention or involvement in conversation. On the other hand, in online conversation, which has rapidly grown in popularity among people, it has been pointed out that gaze may be differently utilized from face-to-face situations. Based on these, the authors collected video data ofconversations from both situations by the same participants' combinations and then analyzed the position and duration of gaze. The result is as follows: gaze position does not show any significantdifference, while the gaze gap between gaze beginning and speech beginning, and gaze beginning and speech terminating are significant. It is suggested that the difference may be related to the difference in the sequence organization between the two settings.
View full abstract
-
Hiromitsu GOTO, Haruka WATANABE, Kanako EDAMOTO, Shun SHIRAMATSU, Taka ...
Article type: SIG paper
Pages
121-126
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Hitoshi ONO, Itaru KURAMOTO
Article type: SIG paper
Pages
127-132
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This research focuses on making spoken dialogue systems faster when using large language models (LLMs). In a verbal dialogue system, the response using LLMs can only generate after the whole text of the user's speech is recognized. This causes a delay before the system replies. To solve this problem, we propose a method that starts generating a response while the user is still speaking, without waiting for them to finish. However, this means that the system has to respond to sentences that might be incomplete, which could affect the content of the response. In this study, we analyze how the response quality changes depending on how much of the user's sentence is missing.
View full abstract
-
Mikio NAKANO, Kazunori KOMATANI
Article type: SIG paper
Pages
133-138
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This paper describes the design and implementation of DialBB-NC, a no-code dialogue system development tool. To apply dialogue system technology to various fields, it is desirable that even non-technical users can build dialogue systems. DialBB-NC is a tool that enables the development of dialogue systems without coding, facilitating installation, configuration, dialogue knowledge editing, and application deployment. Dialogue knowledge can be described by combining state transition networks with large language model calls. DialBB-NC is implemented using the dialogue system development framework DialBB and is released as open-source software like DialBB.
View full abstract
-
Ryota NONOMURA, Hiroki MORI
Article type: SIG paper
Pages
139-144
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
LLM (Large language model)-based multi-agent systems face significant challenges in achieving human-like natural dialogue. Existing systems rely on simplistic turn-taking models, failing to adequately reproduce the nuanced social interactions inherent in human conversations. This study proposes the Murder Mystery Agents (MMAgents) framework, implementing principles of conversational turn-taking discovered in conversation analysis research with a speaker selection mechanism based on adjacency pairs and turn-taking, and a self-selection method considering agents' internal states. Through experiments using a murder mystery game setting, it was confirmed that the dialogue coherence and reasoning capabilities were substantially improved. The experimental results also demonstrate reduced dialogue breakdowns and enhanced information sharing, offering novel design guidelines for multi-agent dialogue systems that incorporate human conversational norms.
View full abstract
-
Haruka ABE
Article type: SIG paper
Pages
145-150
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Eri KATO
Article type: SIG paper
Pages
151-154
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Ayaka KOKUBUN, Mika ENOMOTO
Article type: SIG paper
Pages
155-159
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This study analyzes who retains the speaking turn after an interruptive overlapping utterance occurs within a speaker's turn-constructional unit. The analysis is based on the Chiba University Three-Person Corpus, examining two sets of conversations (each involving three female speakers and three male speakers) totaling approximately 20 minutes. The results of the analysis identified the following four patterns:(1) If the interrupted speaker continues speaking, they retain the speaking turn.(2) If the interrupted speaker stops speaking, the speaking turn shifts to the interrupter.(3) Even if the interrupted speaker continues speaking, if a third party responds to the interrupting utterance, the speaking turn shifts to the interrupter.(4) If both the interrupted speaker and the interrupter continue speaking, the interrupted speaker may respond to the interrupter, and the interrupter may respond to the interrupted speaker, leading to a situation where two speaking turns run in parallel.
View full abstract
-
Chisaki MUNAKATA, Mika ENOMOTO
Article type: SIG paper
Pages
160-164
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Ami TANAKA, Mika ENOMOTO
Article type: SIG paper
Pages
165-170
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Kazushi KATO, Koji INOUE, Tatsuya KAWAHARA
Article type: SIG paper
Pages
171-176
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
In human dialogue, nonverbal information, such as nodding, eye contact, and facial expressions, plays as important a role as verbal information. Therefore, spoken dialogue systems are also required to appropriately express such nonverbal information. This study focuses on nodding as one of the nonverbal listener behaviors and proposes a model to predict the timing and type of noddings in real-time. Listening gestures were additionally recorded for the attentive listening dialogues and classified into three types according to the form of nodding and annotated. We propose a Voice Activity Projection (VAP)-based model that takes both the listener's and the speaker's speech signals as input. The effect of multi-task learning with backchannel and pre-training using other dialogue data was confirmed. The proposed model can be implemented for a real-time avatar attentive listening system.
View full abstract
-
Motoori TAKEUCHI, Koji INOUE, Tatsuya KAWAHARA
Article type: SIG paper
Pages
177-182
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
Expression of emotions is crucial in dialogue systems, and reaction of surprise is among them but not well explored. Surprise in dialogue can arise from various factors, dependent on knowledge and context, such as unexpected developments or the rarity of events. In this study, we evaluated three methods for generating surprise response in dialogue: (1) direct prediction using few-shot prompting with an LLM, (2) fine-tuning a BERT model on dialogue data, and (3) predicting the continuation of a dialogue with an LLM and judging surprise based on the discrepancy with the actual utterance. Our experiments demonstrated that the direct use of an LLM achieves the best performance. Further analysis of the reasoning behind GPT's judgments revealed instances where it incorrectly failed to exhibit surprise, even in surprising situations, citing reasons such as "it's common and ordinary". This highlights the difficulty in accurately generating surprise responses and suggests directions for future improvement.
View full abstract
-
Akane FUKUSHIGE, Koji INOUE, Tatsuya KAWAHARA, Sanae YAMASHITA, Ryuich ...
Article type: SIG paper
Pages
183-188
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Koji INOUE, Divesh LALA, Mikey ELMERS, Keiko OCHI, Tatsuya KAWAHARA
Article type: SIG paper
Pages
189-194
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Rui SAKAIDA, Koji INOUE, Yui SAKAO, Yukiko NAKABAYASHI, Daisuke YOKOMO ...
Article type: SIG paper
Pages
195-200
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
Our goal is to construct a framework for annotating the process of understanding the meaning of gestures in spoken conversations in a simple and versatile way. As a first step, we conduct sequential annotation of embodied actions on the movement scenes from a multimodal corpus of conversations between science communicators and visitors at the National Museum of Emerging Science and Innovation (Miraikan SC corpus). For a sequence in which a science communicator (SC) prompts a visitor to move by some movement or utterance, and the visitor follows and begins to move, we annotate the SC's first action and the visitor's second action. The first actions of SCs are annotated as walking, pointing, change of orientation, speech, and/or gesture. In this paper, we introduce the purpose and outline of the sequential annotation of embodied actions under development and the results of preliminary analyses based on the annotations.
View full abstract
-
Kazuki OKAZAKI, Yuichiro ENDO, Itaru KURAMOTO
Article type: SIG paper
Pages
201-207
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
To build a good relationship between a conversational robot and a user, trust is important. However, it is not easy to establish trust in conversations with the robot. This study evaluates the naturalness of conversations as a fundamental aspect of building trust by analyzing dialogues between a conversational robot with a large language model and users from different age groups. As a result of experiments in which the robot conversed with both younger and older users, no significant differences were found in how the naturalness of the conversation was perceived across age groups. However, the results suggest that older users may have a more positive impression of the robot compared to younger users.
View full abstract
-
Miki TAGASHIRA
Article type: SIG paper
Pages
208-213
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Natsumi KUBOTA, Sakti SAKRIANI
Article type: SIG paper
Pages
214-217
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
Stuttered speech presents significant challenges for automatic speech recognition (ASR) due to its irregular patterns and the scarcity of annotated data. This limitation hinders the development of robust systems capable of accurately recognizing and processing stuttered speech. To address these issues, this study propose a novel approach that leverages text-to-speech (TTS) technology for data augmentation, enabling the synthesis of realistic stuttered speech to supplement existing datasets. Using this augmented data, this study develop an ASR system within a speech translation framework designed to transform stuttered speech into fluent text.
View full abstract
-
Ayano KITAHARA, Hiromichi HOSOMA
Article type: SIG paper
Pages
218-221
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
How can synchronization of body movements be achieved when tap dancing between Deaf and hearing people? In this study, an observational record was made of a situation in which a hearing and sighted person acted as an instructor and practiced a tap dance number with Deaf, visually impaired and hearing and sighted students. In this practice, the instructor needed to provide the audiovisual cues necessary for synchronization to all of the Deaf, visually impaired and other students. Twenty-six examples were analyzed by ELAN to determine what kind of spatiotemporal structure is present in these cues when synchronization is achieved. The results showed that the lecturers produced accurate cues not only aurally but also visually, by performing preliminary rhythmic movements (e.g. large swings up) or movements with a clear starting point (e.g. quick release of folded hands) immediately before the emission of an auditory cue, such as hand clapping. It was also found that these movements were developed on the spot after several attempts.
View full abstract
-
Mikimasa OMORI
Article type: SIG paper
Pages
222
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Mayuka KONO, Yutaro HIRAO, Monica PERUSQUIAHERNANDEZ, Hideaki UCHIYAMA ...
Article type: SIG paper
Pages
223-227
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
Understanding the language and things recognition of children with ASD is crucial for assessing the appropriateness of support for them. However, it is difficult for others to comprehend these aspects. Previous research has attempted to replicate ASD-related visual characteristics to promote understanding, but such approaches have been insufficient in capturing the highly individualistic traits of ASD. To address this, our study aims to contribute to the elucidation of language cognition mechanisms in children with ASD and the development of new support methods. Specifically, we focus on (1) generating diverse personas of children with ASD using LLMs and (2) establishing ASDKidsPersonaLLM, which incorporates these personas. In this paper, we investigate prompts that enable the LLM to distinguish between stories created by children with ASD and those by typically developing children. We investigated whether the LLM identifies stories created by children with ASD by constructing a five-choice QA dataset. We improved classification accuracy to 33% by incorporating inferred problem-solving processes for examples into the prompt.
View full abstract
-
Takumi SHABANA, Chiaki SAKAMA
Article type: SIG paper
Pages
228-233
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
Picture cards illustrating emotions and actions with words are used to support children with developmental disorders and communication difficulties. Recently, AI image generators are used for different purposes in several applications, but the production of picture cards still relies on manual work and takes time and cost. This study aims at supporting emotional awareness and communication in therapy using generative AI in two aspects. First, we propose a method of inferring emotions and communication represented by picture card illustrations using a Large Language Model (LLM) and improving the accuracy of word inference through fine-tuning. We evaluate whether the generated words correctly represent emotion and communication described in illustrations. Second, we introduce a method of generating picture card illustrations using an image generator Stable Diffusion. We verify whether the generated illustrations express emotions properly.
View full abstract
-
Toshiharu MATSUMOTO
Article type: SIG paper
Pages
234-238
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
Matsumoto researched the use of dialects among children and individuals with autism spectrum disorder (ASD) (Matsumoto, 2017). This research was initiated due to a local belief that "children with autism do not speak the Tsugaru dialect" in the Tsugaru region of Aomori Prefecture, Japan. National survey results revealed a widespread perception that individuals with ASD across Japan do not use dialects, with particularly low use of dialect vocabulary. In addition, studies have reported that in regions where there is a significant divergence between the dominant language in natural communication and the media (such as Iceland and Arabic-speaking areas), individuals with ASD tend to use the media-dominant language more frequently, suggesting a potential influence of the media on language acquisition. Conversely, previous research on language development emphasizes the importance of social interaction in language learning, and there are strong critiques of media-based language acquisition. This paper proposes an integrated interpretation of the phenomenon observed in Japan, Iceland, and North Africa, drawing on the perspectives of disability characteristics, societal language systems, and the evolution of media tools and content while incorporating insights from previous language development studies.
View full abstract
-
Keiko OCHI, Hanae KOISO, Mitsuru MAKUUCHI, Tatsuya KAWAHARA
Article type: SIG paper
Pages
239-241
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
The communication characteristics of adults and children with Autism Spectrum Disorder (ASD) have been studied in terms of turn-taking time and prosodic features. However, there are few studies that analyze conversations from the perspective that autistic traits exist on a continuum across the entire population to varying degrees. In this study, we investigated turn-taking and backchanneling characteristics of participants with high and low levels of autistic traits using the Corpus of Everyday Japanese Conversation (CEJC). The results showed that individuals with stronger autistic traits exhibited longer turn-taking gaps, whereas those with weaker autistic traits produced backchannels more frequently in response to their interlocutors.
View full abstract
-
Rito SUZUKI, Shuhei TAKAHATA, Kei TERAYAMA, Yoshihiro KURODA, Naoto IE ...
Article type: SIG paper
Pages
242-247
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
In occupational therapy, there is a need to assess children's posture control abilities for screening of sensory integration disorder. However, it is difficult to quantitatively evaluate children's postural control abilities based on occupational therapists' subjective assessments. Previous studies have attempted to address the issue using human pose estimation methods, but they only used a limited number of keypoints, such as the knees and elbows. In this study, we propose a model to predict occupational therapists' subjective assessments. This model is created based on spatial temporal graph convolutional network and leverages all the keypoints obtained through a human pose estimation method. The experimental results showed that the Spearman's correlation coefficient with occupational therapists' evaluation was 0.848. Furthermore, the findings suggested that the lower body is more important than the upper body. In addition to the previously considered knee and ankle, the relationship between the heel and toes is also crucial. The achievement could help identify keypoint features that have previously been overlooked but are essential, contributing to the development of more effective assessments in occupational therapy.
View full abstract
-
Asumi SUZUKI, Michiru MAKUUCHI, Makoto WADA, Kimihiro NAKAMURA, Naomi ...
Article type: SIG paper
Pages
248-253
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Takeru ISAKA, Ryohei SAIJO, Shohei MATSUO, Iwaki TOSHIMA, Junichi SAWA ...
Article type: SIG paper
Pages
254-257
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
In workplaces where there are many people with high ASD tendencies, there is a social problem of high turnover rates due to experiences of communication failures within the workplace, as miscommunication is likely to occur. In particular, the possibility of misinterpreting ambiguous words differently from the speaker's intention increases following the strength of the ASD tendency, and it has been found that this is a significant cause of miscommunication. In order to solve this problem, we are developing a tool that detects and notifies ambiguous words spoken in online meetings and encourages speakers to rephrase them in clear words. We introduced this tool into a workplace where many people with high ASD tendencies worked for about two months. We observed a decrease in the number of ambiguous words detected during online meetings at work. In interviews with participants, we also received reports such as 'the number of reworks has decreased due to the clarification of content,' confirming the effectiveness of introducing it into the workplace.
View full abstract
-
Ryosaku MAKINO, Keisuke KADOTA, Atushi YAMAMOTO
Article type: SIG paper
Pages
258-261
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
-
Hanae SUZUKI, Yasuhiro MINAMI, Tessei KOBAYASHI, Yumiko AKUTSU
Article type: SIG paper
Pages
262-267
Published: March 06, 2025
Released on J-STAGE: March 06, 2025
CONFERENCE PROCEEDINGS
RESTRICTED ACCESS
This study elucidates the individuality of the ways in which infants acquire vocaburary and, based on this understanding, attempts to classify the vocabulary development processes. Specifically, vocabulary development data from infants with similar vocabulary sizes were used for topic analysis using Latent Dirichlet Allocation (LDA). By analyzing these topics, we investigated the individuality of the infants. Additionally, using the results, we attempted to construct Support Vector Machines (SVMs) by using the ratios of topics output for each infant's vocabulary as an input vector. Through this analysis, we examined whether there are types of individuality. Consequently, this study examines whether the process of vocabulary development in infants depends on other factors in addition to the number of words.
View full abstract