2020 年 35 巻 6 号 p. B-K46_1-8
Pointer-generator, which is the one of the strong baselines in neural summarization models, generates summaries by selecting words from a set of words (output vocabulary) and words in source documents. A conventional method for constructing output vocabulary collects highly frequent words in summaries of training data. However, highly frequent words in summaries could be usually a high possibility to be frequent in source documents. Thus, an output vocabulary constructed by the conventional method is redundant for pointer-generator because pointergenerator can copy words in source documents. We propose a vocabulary construction method that selects words included in each summary but not included in its source text of each pair. Experimental results on CNN/Daily Mail corpus and NEWSROOM corpus showed that our method contributes to improved ROUGE scores while obtaining high ratios of generating novel words that do not occur in source documents.