Abstract
We propose methods to find rare information from the Web using two approaches. We define rare information as relevant and atypical information. In the first approach, we use social bookmark data. Especially, we focus on tags which are sometimes used to annotate topics of the pages. The second approach is based on word co-occurrence in a orpus. In both approaches, we use conditional probabilities to express relevancy and atypicality. In experiments, we compared our methods with the relevance-oriented method, the diversity-oriented method, and another rarity-oriented method. We prepared typical and atypical pages for ten queries, and ranked them using each method. Our methods using word co-occurrence obtained better nDCG scores than the other methods.