Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper
The Effectiveness of Data Augmentation by Removing Unimportant sentence
Tomohito OuchiMasayoshi Tabuse
Author information
JOURNAL FREE ACCESS

2021 Volume 28 Issue 2 Pages 350-379

Details
Abstract

In recent years the amount of information on the Internet has increased exponentially.Consequently, automatic article summarisation technology will be indispensable.In this study, we propose a data augmentation method for an automatic summarisation system.The proposed method removes the least important sentence in an article.We used a topic model to determine the importance of sentences in articles. The Luhn and LexRank methods were used as comparative methods for determining the importance of sentences in articles. Additionally, we used Easy Data Augmentation (EDA) techniques as the comparison method for this study. EDA is a data augmentation method applied to document classification.A comparative experiment was performed using input datasets with 28,000, 57,000, and 287,226 articles.The Luhn and LexRank methods always produced the worst results, while EDA sometimes performed worse than the baseline method without a data augmentation. The proposed method performed the best in all cases.

Content from these authors
© 2021 The Association for Natural Language Processing
Previous article Next article
feedback
Top