人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
論文
単語親密度に基づく基本語彙の選定
佐藤 浩史笠原 要金杉 友子天野 成昭
著者情報
ジャーナル フリー

2004 年 19 巻 6 号 p. 502-510

詳細
抄録

This paper proposes a new method for selecting fundamental vocabulary. We are presently constructing the Fundamental Vocabulary Knowledge-base of Japanese that contains integrated information on syntax, semantics and pragmatics, for the purposes of advanced natural language processing. This database mainly consists of a lexicon and a treebank: Lexeed (a Japanese Semantic Lexicon) and the Hinoki Treebank. Fundamental vocabulary selection is the first step in the construction of Lexeed. The vocabulary should include sufficient words to describe general concepts for self-expandability, and should not be prohibitively large to construct and maintain. There are two conventional methods for selecting fundamental vocabulary. The first is intuition-based selection by experts. This is the traditional method for making dictionaries. A weak point of this method is that the selection strongly depends on personal intuition. The second is corpus-based selection. This method is superior in objectivity to intuition-based selection, however, it is difficult to compile a sufficiently balanced corpora. We propose a psychologically-motivated selection method that adopts word familiarity as the selection criterion. Word familiarity is a rating that represents the familiarity of a word as a real number ranging from 1 (least familiar) to 7 (most familiar). We determined the word familiarity ratings statistically based on psychological experiments over 32 subjects. We selected about 30,000 words as the fundamental vocabulary, based on a minimum word familiarity threshold of 5. We also evaluated the vocabulary by comparing its word coverage with conventional intuition-based and corpus-based selection over dictionary definition sentences and novels, and demonstrated the superior coverage of our lexicon. Based on this, we conclude that the proposed method is superior to conventional methods for fundamental vocabulary selection.

著者関連情報
© 2004 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top