Abstract
Retrieval based on semantic similarity between words (hereafter, similarity-basedretrieval) is one of the important problems in document retrieval technologies.Inprevious research on semantic similarity, measures of word-similarity using the thesauruswhose hierarhical structure is balanced, were used and thier effectiveness wereshown in applications such as language translation and document retrieval.This paperproposes a general measure of similarity which is applicable for both balancedand unbalanced thesauri.In this proposed measure, the lesser the number of conceptsunder the most specific common abstraction between concepts of words, thelarger the similarity between words.The authors have implemented a similaritybasedretrieval system using this semantic similarity and one of large-scale thesauri, EDR thesaurus.Moreover, in order to improve its accuracy, they have incorporatedword sense disambiguation method into the retrieval system.This retrieval systemis based on a conventional system, an extended boolean retrieval system using thephisical nearness between words and the weight of words.Through contrastive experimentswith the extended boolean system, the authors have shown the improvementin both recall and precision by the proposed similarity-based method.