要約データの適切性定量化を利用したカリキュラムラーニング

狩野 竜示; 谷口 友紀; 大熊 智子

doi:10.5715/jnlp.29.144

Abstract

Previous research of summarization models regards titles as summaries of source texts. However, much research has reported these training data are noisy. We propose an effective method of curriculum learning to train summarization models from noisy data. Curriculum learning is a method to improve performance by sorting training data based on difficulty or noisiness, and is effective to training models with noisy data. However, previous research never applied curriculum learning to summarization tasks. One aim of this research is to validate the effectiveness of curriculum learning to summarization tasks. In translation tasks, previous research quantified noise using two models trained with noisy and clean corpora. Because such corpora do not exist in summarization fields, it is difficult to apply this method to summarization tasks. Another aim of this research is to propose a model that can quantify noise using a single noisy corpus. The training task of the proposed model, Appropriateness Estimator is to distinguish correct source-summary pairs of from randomly assigned pairs. Throughout the training, the model learns to compute the appropriateness of source-summary pairs. We conduct experiments on three summarization models and verify curriculum learning and our method improves the performance.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!