JSAI Technical Report, SIG-SLUD

The Dialogue System Live Competition 7

Shiki SATO, Yuta SASAKI, Shinji IWATA, Takato YAMAZAKI, Masato KOMURO, ...

Article type: SIG paper
Pages 01-08
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_01

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

Building upon the success of the six previous Dialogue System Live Competitions, we organized the seventh edition, "Dialogue System Live Competition 7." This competition series aims to highlight the challenges and limitations of human-computer dialogue in a live event setting. Similar to the prior edition, our focus was on multimodal dialogue systems. The competition featured two tracks: the "Situation Track" and the "Task Track." The Situation Track aims to develop human-like dialogue systems for specific scenarios, while the Task Track focuses on creating dialogue systems capable of achieving task completion in complex and advanced problems. In the preliminary round, 14 teams competed in the Situation Track and 8 teams in the Task Track. This paper provides an overview of the event and reports the results from the preliminary round. The final round is scheduled to be held as a live event at the 103rd Meeting of the Special Interest Group on Spoken Language Understanding and Dialogue Processing.

View full abstract

Download PDF (6536K)
Consultation Through Role-Playing with a Character-Infused Dialogue System

Masako OGATA, Yuri NAKAMURA, Shio ARIMA, Hirofumi KIKUCHI

Article type: SIG paper
Pages 09-12
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_09

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This paper reports on the control mechanisms and dialogue strategies adopted in the dialogue system developed for the Dialogue System Live Competition 7 (Situation Track). In this competition, participants were required to develop a system that listens to users' complaints and supports their decision-making. To effectively engage with users' grievances, it is essential to encourage self-disclosure. Therefore, in this study, we designed the system to exhibit a character personality commonly associated with the so-called "gal" archetype, which actively expresses positive and empathetic interest toward users. Furthermore, to assist users in making decisions, we incorporated a role-playing strategy aimed at fostering awareness and understanding of emotions. As a result, our system achieved sixth place in the competition's preliminary round.

View full abstract

Download PDF (553K)
A spoken dialogue system that listens to complaints and supports decision-making based on the speaker's emotions

Chikara HANAKAWA, Kota YAMAMOTO, Sota KOBORI, Shinya FUJIE

Article type: SIG paper
Pages 13-18
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_13

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

In this report, we discuss the design of prompts for a large language model in a spoken dialogue system that listens to complaints and supports decision-making. The system explicitly divides the dialogue into three phases: listening to complaints, supporting decision-making, and casual conversation. Information relevant to each phase is structured into slots, ensuring consistency in system behavior and facilitating appropriate phase transitions. Moreover, the system continuously estimates the underlying emotions behind the speaker's utterances, rather than merely capturing their surface meaning, and generates responses accordingly. This enhances the naturalness of empathetic and listening behaviors, such as expressing sympathy and asking relevant questions. Additionally, to improve the coherence of system behavior, we introduce both general considerations that apply across all phases and phase-specific guidelines, reducing subtle unnaturalness and discomfort. We demonstrate the effectiveness of this design by presenting actual prompts and dialogue examples.

View full abstract

Download PDF (415K)
Dialogue System that Asks Questions to Clarify the User's Intention to Keep the Relationship

Koushiro NUMADA, Yusuke OGAWA, Takuto HIRAKAWA, Keisuke KAMEYAMA, Kent ...

Article type: SIG paper
Pages 19-23
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_19

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

We describe the prompt used in the system developed for the situation track at the 7th Dialogue System Competition. We observed that a prompt containing only information about the situation specified from the organizer tends to lead to superficial conversation without engaging in specific consultations. To address this issue, we designed a prompt to ask questions to clarify the user's intention to keep the relationship with his/her friend and to elicit concrete personal episodes while expressing empathy. We embedded one example of person-to-person dialogue based on the situations into the prompt. In addition, by annotating the agent's voice style, emotions, and gestures for each utterance in the example dialogue, we tried to achieve multi-modal behaviors that keep consistent with the system's responses.

View full abstract

Download PDF (649K)
A empathetic dialogue system based on the user's perspective

Jundai SUZUKI, Junghoon LEE, Shizuya OSAWA, Eisaku MAEDA

Article type: SIG paper
Pages 24-28
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_24

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (880K)
Development of a Dialogue System Co-created with Users of Disability Welfare Services- A Decision Support AI Applying Cognitive Behavioral Therapy -

Hiroki MATSUOKA, Shigehisa FUJITA, Atsushi HORIGUCHI, Kazuki MORI, Sac ...

Article type: SIG paper
Pages 29-33
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_29

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

The dysfunctional thought record method, widely used in cognitive behavioral therapy (CBT), is a technique that involves describing thoughts and emotions related to a specific event, identifying cognitive distortions within them, and guiding individuals toward more realistic thinking. This approach is particularly expected to be useful in addressing difficult situations in interpersonal relationships. In this study, we (Team Careco) developed a multimodal dialogue system incorporating the thought record method in collaboration with users of disability welfare service facilities and participated in the preliminary round of the Situation Track at Dialogue System Live Competition 7. Our system is designed to analyze user concerns using GPT-4o, a generative AI, and automatically present questions aligned with the thought record method. Furthermore, it assesses the potential presence of cognitive distortions in users' responses and aims to support their decision-making. In the preliminary round, our system ranked 7th out of 14 teams.

View full abstract

Download PDF (559K)
A Dialogue System that Provides Advice Considering Membership Categories

Taiga MORI

Article type: SIG paper
Pages 34-39
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_34

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This paper describes our system, which is designed based on the concept of membership categories in conversation analysis. The activity of consultation is carried out by participants who assume the two categories of "advisor" and "advisee." In many cases, the advisor is an expert, while the advisee is a non-expert, and the asymmetry in their knowledge defines their respective categories. However, in consultations on general topics, there is not necessarily a knowledge gap between the participants. To address this, the proposed system refers to the categories of "those who have experienced the problem" and "those currently experiencing the problem" by narrating the experience of dealing with a problematic friend, thereby establishing an asymmetric relationship and providing advice.

View full abstract

Download PDF (784K)
Development of a Counseling Avatar Incorporating Shared Belief Narratives from Shared Experiences

Moe NAGAO, Koichiro TERAO, Yuhi OGA, Naoto IWAHASHI, Yuta SASAKI, Taka ...

Article type: SIG paper
Pages 40-45
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_40

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (644K)
A Tourism Information Dialogue System with Multiple Evaluation Metrics-based State Management and Utterance Completion Prediction-based Turn Management

Sota KOBORI, Chikara HANAKAWA, Setsu ITO, Shinya FUJIE

Article type: SIG paper
Pages 46-49
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_46

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

For the control strategy of a spoken dialogue system, we propose a method based on state management using multiple evaluation metrics and turn management with utterance completion prediction. The dialogue management is divided into two phases: condition confirmation and tourist spot explanation, each controlled by different evaluation metrics. The condition confirmation phase employs three metrics to gradually refine user requirements: condition ambiguity indicating search condition clarity, concretization difficulty representing requirement specification ease, and search condition sufficiency showing tourist spot search feasibility. The explanation phase provides tourist spot information while dynamically evaluating three metrics: rejection level indicating negative response strength, question detection determining question presence, and peripheral search necessity assessing the need for surrounding facility information. The system transitions between phases based on these evaluation metrics, flexibly updating search conditions in response to user reactions. The turn management system utilizes prediction based on utterance completion and context consideration, enabling responses at appropriate timing based on sentence-final expressions and utterance content. These implementations maintain natural dialogue flow while achieving effective tourist spot recommendations through systematic evaluation and control.

View full abstract

Download PDF (364K)
Tourist Attraction Recommendation System using Dialogue Management based on State Transition and LLM-based Response Generation

Keisuke KAMEYAMA, Syunsuke FUKAZAWA, Kenta YAMAMOTO, Kazunori KOMATANI

Article type: SIG paper
Pages 50-55
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_50

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This paper describes our system submitted for the task track of the Dialogue System Live Competition 7. In task-oriented dialogue systems, dialogue management with LLM sometimes fails to achievetask completion. For this problem, we combine rule-based dialogue management by DialBB and response generation by LLM. This enables the system to reliably determine tourist attraction while maintaining flexibility in conversation. In addition, we use a VAD instead of the Remdis built-in VAP, because unnecessary interruptions by the system occurred while the user was speaking. In the preliminary round, our system received positive feedback from the users when the dialogue was successfully completed without trouble.

View full abstract

Download PDF (608K)
Dialogue System with Dynamic Prompt Selection Based on Dialogue Content

Nozomi KIMATA, Qiujie WANG, Yoshiki TOMITA, Haruki HATAKEYAMA, Eisaku ...

Article type: SIG paper
Pages 56-61
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_56

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

The system is composed of three modules: summarization, transition, and searching.In the summarization module, a large language model (LLM) summarizes past dialogue contentto generate the necessary text for the next response. This enables the LLM to comprehensivelyunderstand the dialogue history, leading to more coherent and contextually relevant responses. Thetransition module allows the LLM to reference the summary and determine whether the objectiveof each phase has been achieved. If necessary, it facilitates the transition to the next phase.By employing appropriate prompts for each phase, the system ensures step-by-step, goal-orientedresponse generation. The search module is activated when suggesting tourist destinations. Itestimates user preferences based on the summarized information and searches for destinations thatalign with those preferences. By integrating these three modules, the system maintains dialogueconsistency while delivering personalized tourist recommendations that effectively meet user needs.

View full abstract

Download PDF (718K)
Travel guide bot proposing various sightseeing spots

Tamotsu MIYAMA, Shogo OKADA

Article type: SIG paper
Pages 62-67
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_62

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This paper reports on the travel guidance dialogue system developed for the Task Track of Dialogue System Live Competition 7. This system is based on Remdis and MMDAgent-EX and is designed to assist users in selecting an optimal travel destination by providing diverse candidate sightseeing locations. To manage the dialogue flow, the system utilizes a state transition diagram, dynamically generating and assigning prompts to effectively control interactions. For selecting candidate sightseeing locations, it employs GPT-4o fine-tuned with supervised learning using sightseeing data and applies the Chain-of-Thought (CoT) method to extract relevant locations. Additionally, the system incorporates Travel Viewer, which displays a standard set of 45 sightseeing images and maps on the screen, along with their descriptions, to support users in making informed travel decisions. Furthermore, this paper analyzes user evaluations from the preliminary round of Dialogue System Live Competition 7 and explores the design of an optimal travel guidance chatbot.

View full abstract

Download PDF (1593K)
An Expressive Spoken Dialogue System to Assist Travel Destination Decisions Using Retrieval-Augmented Generation to Show Evidence Satisfying User Preferences

Chihiro ASADA, Mika ONUKI, Yuto GOTO, Masaki NOSE

Article type: SIG paper
Pages 68-73
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_68

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This paper presents our spoken dialogue system developed for the 7th Dialogue System Live Competition. To achieve the competition tasks about travel planing while providing with a better user experience, we designed the system to exhibits evidence that matches user preferences using retrieval-augmented generation in addition to various gestures and facial expressions. The system largely consists of the following three blocks. The first block determines the next dialogue state and tasks based on the evaluation results in order to enable the system to return flexible responses to various topics. The second incorporates possible responses into prompts and then generates convincing proposals with supporting images and maps. The last generates expressive turn-taking and behavior of the virtual agent such as facial expressions, gestures and backchannels. As a result, we placed second in the preliminary round with high rate of evaluation.

View full abstract

Download PDF (2006K)
Human-in-the-Loop Agentic Design for Visual-Enhanced Trip Planning Conversational Assistant

Yuhi OGA, Koichiro TERAO, Moe NAGAO, Naoto IWAHASHI, Yuta SASAKI, Taka ...

Article type: SIG paper
Pages 74-79
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_74

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (2275K)
Evaluation of Large Language Models' Foreign Language Teaching Ability: An Experimental Study Focusing on Pedagogical Grammar

Dong WANG

Article type: SIG paper
Pages 80-85
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_80

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This study examines the potential applications of large language models (LLMs) in language education. To evaluate their ability to identify the compatibility between grammatical items and example sentences, we designed a task and conducted experiments. Using multiple LLMs, we compared their performance based on accuracy, false negative rate (FN rate), false positive rate (FP rate), and Balanced Score. Additionally, we confirmed that synthetic data could serve as a practical alternative. Future research should focus on developing high-quality synthetic data generation methods and expanding their applicability. The findings of this study are expected to contribute to the establishment of benchmarks for evaluating the grammatical competence of LLMs in natural language.

View full abstract

Download PDF (627K)
The effect of Japanese-English code-switching in human-robot social interaction.

Yoshito NAKAMURA, Mehmood FAISAL, Sakriani SAKTI

Article type: SIG paper
Pages 86-91
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_86

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

The usage and study of conversational robots are expanding across various social domains. Additionally, modern robots are multilingual, enabling interaction with diverse groups. However, when a robot's secondary language isn't fully comprehended, individuals may feel ignored, rejected, or excluded, a phenomenon known as ``ostracism." Although this has been studied in multilingual and code-switching contexts, it hasn't been specifically examined in human-robot interactions. Thus, investigating the psychological impact of robots' code-switching on users is crucial. This study explores how Japanese-English code-switching by conversational robots influences users' ostracism and SoBA (Sense of Being Attended to) scales. Specifically, this study also examined the differences in the impact of Japanese-English code-switching on men and women. The goal is to contribute to the development of more effective and inclusive human-robot interaction systems.

View full abstract

Download PDF (1995K)
Factors Influencing Human Perception of Robots During Communication Initiated by Task-Indeterminate Robots

Anjyu TANAKA, Toshitake KOMAZAKI

Article type: SIG paper
Pages 92-95
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_92

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This study examines how failures in robot speech affect human acceptance of robots. The analysis focuses on conversations between humans and robots that engage in casual conversation, rather than performing specific tasks like industrial robots. The robot used in this study combines M5Stack, a speech synthesis server, a speech recognition server, and Chat-GPT. The method involves measuring acceptance of the robot through a questionnaire survey before the conversation. Then, a robot-initiated conversation lasting about 15 minutes is conducted, during which intentional mishearing by the robot is introduced. After the conversation, acceptance of the robot is measured again through a questionnaire survey. The study integrates the changes in acceptance measured by the questionnaire with the analysis of human actions towards the robot during the conversation, focusing mainly on speech, gaze, and facial orientation.

View full abstract

Download PDF (744K)
Effect of laughter contagion on the acoustic properties of co-occurring laughter

Hinako KIZAWA, Yoshiko ARIMOTO

Article type: SIG paper
Pages 96-101
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_96

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (710K)
Analysis of laughter frequency and its acoustic characteristics based on the dialog acts of preceding utterances

Taisei MAYUZUMI, Yoshiko ARIMOTO

Article type: SIG paper
Pages 102-107
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_102

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (560K)
Strategic Use of "Ha?" in Conversation: Organizing Social Interaction through Claims of Incomprehensibility

Mei SAKAUE

Article type: SIG paper
Pages 108-113
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_108

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (608K)
Gaze Distribution between Face-to-Face and Online Conversation

Yasuyuki USUDA, Arata WATANABE

Article type: SIG paper
Pages 114-120
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_114

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This study examines the difference of gaze distribution between face-to-face and online conversation focusing on gaze destination and distribution timing. Many studies have shown interest in gaze as it plays important roles in forecasting turn change and indicatingattention or involvement in conversation. On the other hand, in online conversation, which has rapidly grown in popularity among people, it has been pointed out that gaze may be differently utilized from face-to-face situations. Based on these, the authors collected video data ofconversations from both situations by the same participants' combinations and then analyzed the position and duration of gaze. The result is as follows: gaze position does not show any significantdifference, while the gaze gap between gaze beginning and speech beginning, and gaze beginning and speech terminating are significant. It is suggested that the difference may be related to the difference in the sequence organization between the two settings.

View full abstract

Download PDF (899K)
Relationship Between Speech Characteristics and Personality Traits of Elementary and Junior High School Students in Interactive Classrooms

Hiromitsu GOTO, Haruka WATANABE, Kanako EDAMOTO, Shun SHIRAMATSU, Taka ...

Article type: SIG paper
Pages 121-126
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_121

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (670K)
The Relationship Between the Missing Rate of User-Spoken Sentences and the Appropriateness of LLM-Generated Replies

Hitoshi ONO, Itaru KURAMOTO

Article type: SIG paper
Pages 127-132
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_127

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This research focuses on making spoken dialogue systems faster when using large language models (LLMs). In a verbal dialogue system, the response using LLMs can only generate after the whole text of the user's speech is recognized. This causes a delay before the system replies. To solve this problem, we propose a method that starts generating a response while the user is still speaking, without waiting for them to finish. However, this means that the system has to respond to sentences that might be incomplete, which could affect the content of the response. In this study, we analyze how the response quality changes depending on how much of the user's sentence is missing.

View full abstract

Download PDF (569K)
DialBB-NC: An Open-Source No-Code Tool for Dialogue System Development

Mikio NAKANO, Kazunori KOMATANI

Article type: SIG paper
Pages 133-138
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_133

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This paper describes the design and implementation of DialBB-NC, a no-code dialogue system development tool. To apply dialogue system technology to various fields, it is desirable that even non-technical users can build dialogue systems. DialBB-NC is a tool that enables the development of dialogue systems without coding, facilitating installation, configuration, dialogue knowledge editing, and application deployment. Dialogue knowledge can be described by combining state transition networks with large language model calls. DialBB-NC is implemented using the dialogue system development framework DialBB and is released as open-source software like DialBB.

View full abstract

Download PDF (811K)
Multi-party Discussion by AI Agents Leveraging the Systematics of Turn-taking

Ryota NONOMURA, Hiroki MORI

Article type: SIG paper
Pages 139-144
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_139

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

LLM (Large language model)-based multi-agent systems face significant challenges in achieving human-like natural dialogue. Existing systems rely on simplistic turn-taking models, failing to adequately reproduce the nuanced social interactions inherent in human conversations. This study proposes the Murder Mystery Agents (MMAgents) framework, implementing principles of conversational turn-taking discovered in conversation analysis research with a speaker selection mechanism based on adjacency pairs and turn-taking, and a self-selection method considering agents' internal states. Through experiments using a murder mystery game setting, it was confirmed that the dialogue coherence and reasoning capabilities were substantially improved. The experimental results also demonstrate reduced dialogue breakdowns and enhanced information sharing, offering novel design guidelines for multi-agent dialogue systems that incorporate human conversational norms.

View full abstract

Download PDF (8892K)
Analyzing Dialect through Conversation Analysis: Focusing on "Dakara" as an Acknowledgement in the Miyagi Dialect

Haruka ABE

Article type: SIG paper
Pages 145-150
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_145

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (618K)
Reconsidering the conversational use of "suteki" and "subarashii"

Eri KATO

Article type: SIG paper
Pages 151-154
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_151

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (450K)
An Analysis of Interruptive Overlapping Utterances in Relation to Turn-Holding and Turn-Seizing

Ayaka KOKUBUN, Mika ENOMOTO

Article type: SIG paper
Pages 155-159
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_155

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This study analyzes who retains the speaking turn after an interruptive overlapping utterance occurs within a speaker's turn-constructional unit. The analysis is based on the Chiba University Three-Person Corpus, examining two sets of conversations (each involving three female speakers and three male speakers) totaling approximately 20 minutes. The results of the analysis identified the following four patterns:(1) If the interrupted speaker continues speaking, they retain the speaking turn.(2) If the interrupted speaker stops speaking, the speaking turn shifts to the interrupter.(3) Even if the interrupted speaker continues speaking, if a third party responds to the interrupting utterance, the speaking turn shifts to the interrupter.(4) If both the interrupted speaker and the interrupter continue speaking, the interrupted speaker may respond to the interrupter, and the interrupter may respond to the interrupted speaker, leading to a situation where two speaking turns run in parallel.

View full abstract

Download PDF (1188K)
The Function of the Speaker's Gaze and Hearer's Laughter:The Hearer, Gazed by the Speaker, Gives a Polite Smile

Chisaki MUNAKATA, Mika ENOMOTO

Article type: SIG paper
Pages 160-164
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_160

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (1308K)
An Analysis of Participants' Nonverbal Behavior During Adjacency Pairs in Multi-Party Conversation

Ami TANAKA, Mika ENOMOTO

Article type: SIG paper
Pages 165-170
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_165

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (800K)
Real-time Prediction of Various Types of Nodding for Avatar Attentive Listening System

Kazushi KATO, Koji INOUE, Tatsuya KAWAHARA

Article type: SIG paper
Pages 171-176
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_171

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

In human dialogue, nonverbal information, such as nodding, eye contact, and facial expressions, plays as important a role as verbal information. Therefore, spoken dialogue systems are also required to appropriately express such nonverbal information. This study focuses on nodding as one of the nonverbal listener behaviors and proposes a model to predict the timing and type of noddings in real-time. Listening gestures were additionally recorded for the attentive listening dialogues and classified into three types according to the form of nodding and annotated. We propose a Voice Activity Projection (VAP)-based model that takes both the listener's and the speaker's speech signals as input. The effect of multi-task learning with backchannel and pre-training using other dialogue data was confirmed. The proposed model can be implemented for a real-time avatar attentive listening system.

View full abstract

Download PDF (1954K)
Generation of Surprise Expression in Dialogue by Using LLMs

Motoori TAKEUCHI, Koji INOUE, Tatsuya KAWAHARA

Article type: SIG paper
Pages 177-182
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_177

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

Expression of emotions is crucial in dialogue systems, and reaction of surprise is among them but not well explored. Surprise in dialogue can arise from various factors, dependent on knowledge and context, such as unexpected developments or the rarity of events. In this study, we evaluated three methods for generating surprise response in dialogue: (1) direct prediction using few-shot prompting with an LLM, (2) fine-tuning a BERT model on dialogue data, and (3) predicting the continuation of a dialogue with an LLM and judging surprise based on the discrepancy with the actual utterance. Our experiments demonstrated that the direct use of an LLM achieves the best performance. Further analysis of the reasoning behind GPT's judgments revealed instances where it incorrectly failed to exhibit surprise, even in surprising situations, citing reasons such as "it's common and ordinary". This highlights the difficulty in accurately generating surprise responses and suggests directions for future improvement.

View full abstract

Download PDF (506K)
Estimating Relationships between Participants in Multi-Party Chat Corpus

Akane FUKUSHIGE, Koji INOUE, Tatsuya KAWAHARA, Sanae YAMASHITA, Ryuich ...

Article type: SIG paper
Pages 183-188
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_183

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (380K)
Performance of an LLM on Addressee Recognition in Multi-Party Dialogue

Koji INOUE, Divesh LALA, Mikey ELMERS, Keiko OCHI, Tatsuya KAWAHARA

Article type: SIG paper
Pages 189-194
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_189

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (523K)
Sequential Annotation of Embodied Actions in Movement Scenes from a Multimodal Corpus: A Case Study of Miraikan SC Corpus

Rui SAKAIDA, Koji INOUE, Yui SAKAO, Yukiko NAKABAYASHI, Daisuke YOKOMO ...

Article type: SIG paper
Pages 195-200
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_195

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

Our goal is to construct a framework for annotating the process of understanding the meaning of gestures in spoken conversations in a simple and versatile way. As a first step, we conduct sequential annotation of embodied actions on the movement scenes from a multimodal corpus of conversations between science communicators and visitors at the National Museum of Emerging Science and Innovation (Miraikan SC corpus). For a sequence in which a science communicator (SC) prompts a visitor to move by some movement or utterance, and the visitor follows and begins to move, we annotate the SC's first action and the visitor's second action. The first actions of SCs are annotated as walking, pointing, change of orientation, speech, and/or gesture. In this paper, we introduce the purpose and outline of the sequential annotation of embodied actions under development and the results of preliminary analyses based on the annotations.

View full abstract

Download PDF (7029K)
The Naturalness of Communication with a Conversational Robot: Differences Between Younger and Older Adult

Kazuki OKAZAKI, Yuichiro ENDO, Itaru KURAMOTO

Article type: SIG paper
Pages 201-207
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_201

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

To build a good relationship between a conversational robot and a user, trust is important. However, it is not easy to establish trust in conversations with the robot. This study evaluates the naturalness of conversations as a fundamental aspect of building trust by analyzing dialogues between a conversational robot with a large language model and users from different age groups. As a result of experiments in which the robot conversed with both younger and older users, no significant differences were found in how the naturalness of the conversation was perceived across age groups. However, the results suggest that older users may have a more positive impression of the robot compared to younger users.

View full abstract

Download PDF (951K)
An Investigation of Prosodic Boundaries and Simultaneous Shifts of Non-manual Markers in Japanese Sign Language

Miki TAGASHIRA

Article type: SIG paper
Pages 208-213
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_208

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (894K)
Addressing ASR Challenges for Stuttered Speech Through a Speech Translation Framework

Natsumi KUBOTA, Sakti SAKRIANI

Article type: SIG paper
Pages 214-217
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_214

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

Stuttered speech presents significant challenges for automatic speech recognition (ASR) due to its irregular patterns and the scarcity of annotated data. This limitation hinders the development of robust systems capable of accurately recognizing and processing stuttered speech. To address these issues, this study propose a novel approach that leverages text-to-speech (TTS) technology for data augmentation, enabling the synthesis of realistic stuttered speech to supplement existing datasets. Using this augmented data, this study develop an ASR system within a speech translation framework designed to transform stuttered speech into fluent text.

View full abstract

Download PDF (318K)
How can Deaf and hearing people synchronize tap?: An analysis of student-instructor interaction in tap dance.

Ayano KITAHARA, Hiromichi HOSOMA

Article type: SIG paper
Pages 218-221
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_218

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

How can synchronization of body movements be achieved when tap dancing between Deaf and hearing people? In this study, an observational record was made of a situation in which a hearing and sighted person acted as an instructor and practiced a tap dance number with Deaf, visually impaired and hearing and sighted students. In this practice, the instructor needed to provide the audiovisual cues necessary for synchronization to all of the Deaf, visually impaired and other students. Twenty-six examples were analyzed by ELAN to determine what kind of spatiotemporal structure is present in these cues when synchronization is achieved. The results showed that the lecturers produced accurate cues not only aurally but also visually, by performing preliminary rhythmic movements (e.g. large swings up) or movements with a clear starting point (e.g. quick release of folded hands) immediately before the emission of an auditory cue, such as hand clapping. It was also found that these movements were developed on the spot after several attempts.

View full abstract

Download PDF (799K)
Speaker's and Listener's communication skills in participants with developmentaldisabilities

Mikimasa OMORI

Article type: SIG paper
Pages 222
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_222

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Download PDF (359K)
Identifiability of Stories by ASD and Typically Developing Children Using LLMs: Optimal Prompt Selection

Mayuka KONO, Yutaro HIRAO, Monica PERUSQUIAHERNANDEZ, Hideaki UCHIYAMA ...

Article type: SIG paper
Pages 223-227
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_223

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

Understanding the language and things recognition of children with ASD is crucial for assessing the appropriateness of support for them. However, it is difficult for others to comprehend these aspects. Previous research has attempted to replicate ASD-related visual characteristics to promote understanding, but such approaches have been insufficient in capturing the highly individualistic traits of ASD. To address this, our study aims to contribute to the elucidation of language cognition mechanisms in children with ASD and the development of new support methods. Specifically, we focus on (1) generating diverse personas of children with ASD using LLMs and (2) establishing ASDKidsPersonaLLM, which incorporates these personas. In this paper, we investigate prompts that enable the LLM to distinguish between stories created by children with ASD and those by typically developing children. We investigated whether the LLM identifies stories created by children with ASD by constructing a five-choice QA dataset. We improved classification accuracy to 33% by incorporating inferred problem-solving processes for examples into the prompt.

View full abstract

Download PDF (372K)
Generation of Picture Cards to Support Emotional Understanding Using Generative AI

Takumi SHABANA, Chiaki SAKAMA

Article type: SIG paper
Pages 228-233
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_228

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

Picture cards illustrating emotions and actions with words are used to support children with developmental disorders and communication difficulties. Recently, AI image generators are used for different purposes in several applications, but the production of picture cards still relies on manual work and takes time and cost. This study aims at supporting emotional awareness and communication in therapy using generative AI in two aspects. First, we propose a method of inferring emotions and communication represented by picture card illustrations using a Large Language Model (LLM) and improving the accuracy of word inference through fine-tuning. We evaluate whether the generated words correctly represent emotion and communication described in illustrations. Second, we introduce a method of generating picture card illustrations using an image generator Stable Diffusion. We verify whether the generated illustrations express emotions properly.

View full abstract

Download PDF (772K)
Exploring the Limited Acquisition of the Tsugaru Dialect in Children with Autism Spectrum Disorder

Toshiharu MATSUMOTO

Article type: SIG paper
Pages 234-238
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_234

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

Matsumoto researched the use of dialects among children and individuals with autism spectrum disorder (ASD) (Matsumoto, 2017). This research was initiated due to a local belief that "children with autism do not speak the Tsugaru dialect" in the Tsugaru region of Aomori Prefecture, Japan. National survey results revealed a widespread perception that individuals with ASD across Japan do not use dialects, with particularly low use of dialect vocabulary. In addition, studies have reported that in regions where there is a significant divergence between the dominant language in natural communication and the media (such as Iceland and Arabic-speaking areas), individuals with ASD tend to use the media-dominant language more frequently, suggesting a potential influence of the media on language acquisition. Conversely, previous research on language development emphasizes the importance of social interaction in language learning, and there are strong critiques of media-based language acquisition. This paper proposes an integrated interpretation of the phenomenon observed in Japan, Iceland, and North Africa, drawing on the perspectives of disability characteristics, societal language systems, and the evolution of media tools and content while incorporating insights from previous language development studies.

View full abstract

Download PDF (2181K)
Autistic Traits and Characteristics of Turn-Taking: An Analysis of Corpus of Everyday Japanese Conversation

Keiko OCHI, Hanae KOISO, Mitsuru MAKUUCHI, Tatsuya KAWAHARA

Article type: SIG paper
Pages 239-241
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_239

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

The communication characteristics of adults and children with Autism Spectrum Disorder (ASD) have been studied in terms of turn-taking time and prosodic features. However, there are few studies that analyze conversations from the perspective that autistic traits exist on a continuum across the entire population to varying degrees. In this study, we investigated turn-taking and backchanneling characteristics of participants with high and low levels of autistic traits using the Corpus of Everyday Japanese Conversation (CEJC). The results showed that individuals with stronger autistic traits exhibited longer turn-taking gaps, whereas those with weaker autistic traits produced backchannels more frequently in response to their interlocutors.

View full abstract

Download PDF (510K)
Assessment of Children's Postural Control Ability Using Graph Convolutional Networks

Rito SUZUKI, Shuhei TAKAHATA, Kei TERAYAMA, Yoshihiro KURODA, Naoto IE ...

Article type: SIG paper
Pages 242-247
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_242

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

In occupational therapy, there is a need to assess children's posture control abilities for screening of sensory integration disorder. However, it is difficult to quantitatively evaluate children's postural control abilities based on occupational therapists' subjective assessments. Previous studies have attempted to address the issue using human pose estimation methods, but they only used a limited number of keypoints, such as the knees and elbows. In this study, we propose a model to predict occupational therapists' subjective assessments. This model is created based on spatial temporal graph convolutional network and leverages all the keypoints obtained through a human pose estimation method. The experimental results showed that the Spearman's correlation coefficient with occupational therapists' evaluation was 0.848. Furthermore, the findings suggested that the lower body is more important than the upper body. In addition to the previously considered knee and ankle, the relationship between the heel and toes is also crucial. The achievement could help identify keypoint features that have previously been overlooked but are essential, contributing to the development of more effective assessments in occupational therapy.

View full abstract

Download PDF (2981K)
The potential of a public corpus of native Japanese speakers with autism spectrum disorders: A comparison of usage rates and ways of using of the interjectional and sentence-final particle "-ne".

Asumi SUZUKI, Michiru MAKUUCHI, Makoto WADA, Kimihiro NAKAMURA, Naomi ...

Article type: SIG paper
Pages 248-253
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_248

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (975K)
Effect of Introducing Ambiguous Word Alerting Tool in ASD-Dominant Workplace

Takeru ISAKA, Ryohei SAIJO, Shohei MATSUO, Iwaki TOSHIMA, Junichi SAWA ...

Article type: SIG paper
Pages 254-257
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_254

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

In workplaces where there are many people with high ASD tendencies, there is a social problem of high turnover rates due to experiences of communication failures within the workplace, as miscommunication is likely to occur. In particular, the possibility of misinterpreting ambiguous words differently from the speaker's intention increases following the strength of the ASD tendency, and it has been found that this is a significant cause of miscommunication. In order to solve this problem, we are developing a tool that detects and notifies ambiguous words spoken in online meetings and encourages speakers to rephrase them in clear words. We introduced this tool into a workplace where many people with high ASD tendencies worked for about two months. We observed a decrease in the number of ambiguous words detected during online meetings at work. In interviews with participants, we also received reports such as 'the number of reworks has decreased due to the clarification of content,' confirming the effectiveness of introducing it into the workplace.

View full abstract

Download PDF (888K)
Visualizing Order in Trampoline Classrooms: Observations of a Classroom Participated by Children with Developmental Disabilities.

Ryosaku MAKINO, Keisuke KADOTA, Atushi YAMAMOTO

Article type: SIG paper
Pages 258-261
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_258

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (1965K)
Analyzing individuality in vocablary deveropment using large scale infant vocabulary data

Hanae SUZUKI, Yasuhiro MINAMI, Tessei KOBAYASHI, Yumiko AKUTSU

Article type: SIG paper
Pages 262-267
Published: March 06, 2025
Released on J-STAGE: March 06, 2025

DOIhttps://doi.org/10.11517/jsaislud.103.0_262

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Show abstractHide abstract

This study elucidates the individuality of the ways in which infants acquire vocaburary and, based on this understanding, attempts to classify the vocabulary development processes. Specifically, vocabulary development data from infants with similar vocabulary sizes were used for topic analysis using Latent Dirichlet Allocation (LDA). By analyzing these topics, we investigated the individuality of the infants. Additionally, using the results, we attempted to construct Support Vector Machines (SVMs) by using the ratios of topics output for each infant's vocabulary as an input vector. Through this analysis, we examined whether there are types of individuality. Consequently, this study examines whether the process of vocabulary development in infants depends on other factors in addition to the number of words.

View full abstract

Download PDF (787K)

Register with J-STAGE for free!