人工知能
Online ISSN : 2435-8614
Print ISSN : 2188-2266
人工知能学会誌(1986~2013, Print ISSN:0912-8085)
単語の意味に関する概念ベースの類似性判別能力からの最適構成
石川 勉井澤 潤次朗笠原 要Kaname Kasahara
著者情報
解説誌・一般情報誌 フリー

1998 年 13 巻 3 号 p. 470-479

詳細
抄録

This paper discusses analytically and experimentally an optimal structure of a large scale knowledge base of words, which is automatically constructed from machine-readable dictionaries. In this knowledge base, each word is represented by a series of weighted keywords. The keywords have some relationship with the word, and the weights of the keywords represent the degree of the strength of the relationship between the word and keywords. In constructing this kind of knowledge base, it is important to select the optimal set of keywords used to represent every word in the knowledge base, considering the ability of measuring the semantic similarity between words. Our analysis, using a simplified model of the knowledge base based on probability theory, has shown that a smaller keyword set using the higher level keyword in the conceptual hierarchy becomes optimal when the size of the knowledge base, namely, the total number of words in it or the average number of keywords per word, becomes large. On the other hand, an experiment using six knowledge bases modified from the previously constructed knowledge base of 40000 Japanese daily-used words has verified the existence of the optimal keyword set. This means that the above mentioned analysis is useful in the design of a knowledge base in which each word is generally represented by a vector. In addition, we have found, from both a subjective evaluation based on human judgment and a newly proposed objective evaluation using a published synonym dictionary, that a set of about 2000 keywords is optimal for constructing a knowledge base of this size.

著者関連情報
© 1998 人工知能学会
前の記事 次の記事
feedback
Top