2020 Volume 27 Issue 4 Pages 825-852
We treat extractive summarization for questions. Neural extractive summarizers often require much labeled training data. Obtaining such labels is difficult, especially for user-generated content, such as questions posted on community question answering services. In this paper, we propose semi-supervised extractive summarizers for such questions that exploit question-answer pairs to alleviate the problem of insufficient labeled data. To this end, we propose several learning methods, namely pretraining, multi-task learning, distant supervision, and sampling methods, to examine how to effectively use such unlabeled paired data. Experimental results show that multi-task training performs well with an appropriate sampling method or distant supervision, especially when the labeled data is small.