音声アクティビティ予測を利用した音声対話システムの構築と自然さの客観評価

樋口 栄作; 山本 智幸; 吉田 茂人

doi:10.11517/pjsai.JSAI2025.0_3G5GS602

39th (2025)

Session ID : 3G5-GS-6-02

DOI https://doi.org/10.11517/pjsai.JSAI2025.0_3G5GS602

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 39

Location : [in Japanese]

Date : May 27, 2025 - May 30, 2025

A Speech Dialogue System Utilizing Voice Activity Prediction and Objective Evaluation of Naturalness

*Eisaku HIGUCHI, Tomoyuki YAMAMOTO, Shigeto YOSHIDA

Author information

Keywords: Large Language Model, Dialogue Systems, Turntaking, Natural Language Processing, Interjections

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

With the advancement of natural language processing technologies, dialogue systems that handle continuous speech are becoming increasingly prevalent. In particular, the responses of dialogue systems that provide backchanneling can disrupt natural conversation due to delays in response speed and interruptions during speech. However, evaluating these systems is challenging because it is difficult to separate backchanneling from the main dialogue. In this study, we focus on turn-taking to achieve natural interactions that include backchanneling, and we have developed a dialogue system utilizing Voice Activity Projection (VAP). This system predicts the start and end times of conversations, allowing for the distinction between backchanneling and interruptive speech. Experiments have confirmed improvements in naturalness, indicating its effectiveness for future dialogue system development.

Corresponding author

Conference information

Register with J-STAGE for free!