Relevance feedback is a method to achieve accurate information retrieval through the application of both evaluated relevant and irrelevant sample documents which are chosen based on an initial query by the system. Retrieval accuracy achieved by relevance feedback changes according to the sample selection methodology for user judgement. Relevance sampling, a method which is often utilized for choosing samples, asks users to label the sample documents which are classified as most likely to be relevant. On the other hand, uncertainty sampling, a method which selects samples according to an unclear classification system, has been reported to be more effective than the original sampling method. However, both sampling methods may select multiple numbers of similar documents, and thus in both cases, the retrieval accuracy should be improved. In this paper, we propose ‘unfamiliar sampling’, a method which evaluates the distance between each pair of documents and removes a candidate for sampling if it is the nearest neighbor of any sample that has already been selected. With this procedure, not only multiple numbers of similar documents are not used for sampling, but, moreover, retrieval accuracy improves. In applying relevance feedback for document retrieval, it is important to achieve high retrieval accuracy with a small number of sample documents. Also, in this paper, we propose ‘Rocchio-Boost’, which applies Rocchio feedback as a weekly learner of AdaBoost. Furthermore, we show that it can achieve high retrieval accuracy. Empirical results on NPL test collection show that the proposed methods improve the average precision of retrieval by 6% over the original relevance sampling and Rocchio feedback.
View full abstract