検索結果表示向け文書要約における情報利得比に基づく語の重要度計算

森 辰則

doi:10.5715/jnlp.9.4_3

Abstract

This paper proposes a new term weighting method for summarizing documents retrieved by IR systems. Unlike query-biased summarization methods, our method utilizes not the information of query, but the similarity information among original documents by hierarchical clustering. In order to map the similarity structure of the clusters into the weight of each word, we adopt the information gain ratio (IGR) of probabilistic distribution of each word as a term weight. If the amount of information of a word in a cluster increases after the cluster is partitioned into sub-clusters, we may consider that the word contributes to determine the structure of the subclusters. The IGR is a measure to express the degree of such contribution. We show the effectiveness of our method based on the IGR by comparison with other systems in Text Summarization Challenge of NTCIR2.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!