人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
ニューラル要約モデルのための原文書と要約対の差分に基づく語彙構築方法
牧野 拓哉岩倉 友哉
著者情報
ジャーナル フリー

2020 年 35 巻 6 号 p. B-K46_1-8

詳細
抄録

Pointer-generator, which is the one of the strong baselines in neural summarization models, generates summaries by selecting words from a set of words (output vocabulary) and words in source documents. A conventional method for constructing output vocabulary collects highly frequent words in summaries of training data. However, highly frequent words in summaries could be usually a high possibility to be frequent in source documents. Thus, an output vocabulary constructed by the conventional method is redundant for pointer-generator because pointergenerator can copy words in source documents. We propose a vocabulary construction method that selects words included in each summary but not included in its source text of each pair. Experimental results on CNN/Daily Mail corpus and NEWSROOM corpus showed that our method contributes to improved ROUGE scores while obtaining high ratios of generating novel words that do not occur in source documents.

著者関連情報
© 人工知能学会 2020
前の記事 次の記事
feedback
Top