Abstract
There have been several previous studies on measuring the semantic
similarity between words whose concepts are represented as points in a
multi-dimensional vector space acquired from text data such as
electronic dictionaries or text corpora. A central problem in these
studies is how to select orthonormal basis vectors for the space which
represents attributes of the words. We propose a method of building
the space by combining two representative methods, one using singular
value decomposition and the other using the contents of a thesaurus.
The proposed method was evaluated both for the purposes of similar
word retrieval and for document retrieval. The evaluations showed
that the proposed combination is more effective than either of the
original methods alone for both of these tasks.