Journal of Japan Society for Fuzzy Theory and Intelligent Informatics
Online ISSN : 1881-7203
Print ISSN : 1347-7986
ISSN-L : 1347-7986
Original Papers
The Automatic Page Grouping System for the Result of WEB Retrieval Using Vector Space Model Method and Fuzzy Reasoning
Hiroo JOICHITsutomu MIYOSHI
Author information
JOURNAL FREE ACCESS

2006 Volume 18 Issue 2 Pages 184-195

Details
Abstract
Since search engines are mainly used for Web page retrieval, the problems are pointed out that required Web pages are not displayed on a higher rank in retrieval result. One of the reasons is that, retrieval result is selected only by the reason that it includes searching key words in them. Even if the user uses same keywords, different kinds of pages tend to be mixed in retrieval result because of polysemy or ambiguity of words.
In order to improve the retrieval result, these are same studies which classifies retrieval result to the group according to page contents using the vector space model method. The vector space model method is measuring the degree of similarity with other pages by the frequency of word appearance, however, this method has two problems. One is that, a cost of calculating the similarity is too high because all words appearing to even once in one page of all pages are used. The reduction of calculation cost should be considered because quick response is better for Web retrieval. In our system, we tried to reduce computational cost by selecting words using fuzzy reasoning. Another is that, it is difficult to show a group name or title to the user. Since this method only calculates the similarity of page, it cannot choose words representing a group. From the viewpoint of the user's convenience, it is desirable to add the technique of creating a group name or title automatically. In our method, we tried to create group names automatically by using the frequency of word co-occurrence.
In this paper, we proposed the system which classifies retrieval result to the group according to page contents by some methods, that is, the fuzzy reasoning for selecting index words, the frequency of word co-occurrence for creating group indexes, and the vector space model method for classifying pages. From the experiments, we confirmed two points. One is that, 200 is better number of selected words from the viewpoint of calculation cost and classification accuracy. Another is that, proposed system performs similar classification for retrieval pages in terms of human sense.
Content from these authors
© 2006 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top