Abstract
Recently documents of various specific fields exist on the Web, which are updated frequently. As a result, there exist many fields-specific new words that are not listed in dictionaries. When a document including fields-specific new words is processed with computers for the purposes of indexing and information extraction, the treatment of new words becomes a problem. This paper proposes a method for extracting new words from category names in a large-scale Web dictionary service. The method is based on several characteristics of a word, such as the number of hits in Google search, the number of categories containing the word in the directory service, and the part-of-speech pattern. The experimental results show their effectiveness for extracting new words.