Finding Rare Information from the Web Using Social Bookmarks and Word Co-occurrence

Takayuki YUMOTO; Takahiro YAMANAKA; Manabu NII; Naotake KAMIURA

doi:10.24466/ijbschs.22.1_9

Abstract

We propose methods to find rare information from the Web using two approaches. We define rare information as relevant and atypical information. In the first approach, we use social bookmark data. Especially, we focus on tags which are sometimes used to annotate topics of the pages. The second approach is based on word co-occurrence in a orpus. In both approaches, we use conditional probabilities to express relevancy and atypicality. In experiments, we compared our methods with the relevance-oriented method, the diversity-oriented method, and another rarity-oriented method. We prepared typical and atypical pages for ten queries, and ranked them using each method. Our methods using word co-occurrence obtained better nDCG scores than the other methods.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!