Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Machine Learning Approach to Multi-Document Summarization
TSUTOMU HIRAOHIDETO KAZAWAHIDEKI ISOZAKIEISAKU MAEDAYUJI MATSUMOTO
Author information
JOURNAL FREE ACCESS

2003 Volume 10 Issue 1 Pages 81-108

Details
Abstract
Due to the rapid growth of the Internet and the emergence of low-price and largecapacity storage devices, the number of online documents is exploding. Automatic summarization is the key handling this situation. The cost of manual work demands that we be able to summarize a document set related to a certain event. This paper proposes a method of extracting important sentences from document sets. The method is based on Support Vector Machines, a technology that is attracting attention in the field of natural language processing. We conducted experiments using three document sets formed from twelve events published in the MAINICHI newspaper of 1999. These sets were manually processed by newspaper editors. Tests using this corpus show that our method has better performance than either the Lead-based method or the TF-IDF method. Moreover, we clarify that reducing redundancy is not always effective for extracting important sentences from a set of multiple documents taken from a single source.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top