Abstract
We propose a new measure to estimate level of public interest given a document. Although personal interests is of great variety, public interest, that is collection of personal interests, has consistency to some extent regardless of time difference. The task here is not to know whether a given document has interest or not, but to know how much interest a given document has, that expects enabling deep interest analysis by use of our measure. This problem has many applications such as display control of documents on the Web, that is assumed to be seen by public. We use in this paper document collection with ranking information in terms of public interest. We estimate level of interest for each word, and then for each document by utilizing the ranking information. As feature set we use three kinds: content words, compound words, and the combination of them. In the evaluation we use newspaper ranking as a source, and evaluate the performance by comparing our output to the real ranking. The results illustrates that the extended rank coefficient of these two rankings is 0.867. We also show that more than 0.90 accuracy is attained for rejecting little interest documents.