Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
A Term Weighting Method based on Information Gain Ratio for Summarizing Documents retrieved by IR Systems
TATSUNORI MORI
Author information
JOURNAL FREE ACCESS

2002 Volume 9 Issue 4 Pages 3-32

Details
Abstract
This paper proposes a new term weighting method for summarizing documents retrieved by IR systems. Unlike query-biased summarization methods, our method utilizes not the information of query, but the similarity information among original documents by hierarchical clustering. In order to map the similarity structure of the clusters into the weight of each word, we adopt the information gain ratio (IGR) of probabilistic distribution of each word as a term weight. If the amount of information of a word in a cluster increases after the cluster is partitioned into sub-clusters, we may consider that the word contributes to determine the structure of the subclusters. The IGR is a measure to express the degree of such contribution. We show the effectiveness of our method based on the IGR by comparison with other systems in Text Summarization Challenge of NTCIR2.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top