ニューラル対話応答生成のための言語非依存な低品質対話データフィルタリング法の提案と分析

赤間 怜奈; 横井 祥; 鈴木 潤; 乾 健太郎

doi:10.11517/pjsai.JSAI2020.0_4Q2GS902

Abstract

In the area of sentence generation using deep neural network technology, e.g., machine translation, automatic summarization, and dialog response generation, approaches to increase the performance of models by improving the quality of training data have been spotlighted. In this paper, we propose a scoring function that detects low-quality utterance-response pairs in training data to improve the performance of a neural dialogue response generation model. Specifically, our function combines two viewpoints, "typical phrase interconnection" and "topic consistency", to rate the plausibility of two consecutive utterances as dialogue. In our experiments, we apply the proposed method to conversation data in multiple languages and demonstrate that the proposed score is correlated with human subjective ratings. Moreover, we demonstrate that training data filtering with our score is effective for improving the performance of response generation models using automatic evaluation and manual evaluation.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!