Abstract
This paper proposes a new term weighting method for summarizing documents retrieved by IR systems. Unlike query-biased summarization methods, our method utilizes not the information of query, but the similarity information among original documents by hierarchical clustering. In order to map the similarity structure of the clusters into the weight of each word, we adopt the information gain ratio (IGR) of probabilistic distribution of each word as a term weight. If the amount of information of a word in a cluster increases after the cluster is partitioned into sub-clusters, we may consider that the word contributes to determine the structure of the subclusters. The IGR is a measure to express the degree of such contribution. We show the effectiveness of our method based on the IGR by comparison with other systems in Text Summarization Challenge of NTCIR2.