文間意味的類似度のベンチマークタスクと実応用タスクの乖離

阿部 香央莉; 横井 祥; 梶原 智之; 乾 健太郎

doi:10.11517/pjsai.JSAI2022.0_4Yin217

Abstract

The Semantic Textual Similarity (STS) task measures the ability to evaluate the similarity between two sentences, which is necessary for downstream tasks such as machine translation evaluation and related passage retrieval. Several NLP researchers discuss the performance of this ability on benchmark dataset. However, there is a possibility that a system that is highly evaluated on the benchmark dataset may not be able to demonstrate appropriate effectiveness in actual downstream tasks. In this study, we examined this gap between STS and downstream tasks, clarified what factors are important in evaluating the similarity between two sentences in the downstream tasks, and discussed a policy for improving the benchmark dataset.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!