Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Media (processing) and Interaction
Construction of a Domain Dictionary for Fundamental Vocabulary and its Application to Automatic Blog Categorization Using Dynamically Estimated Domains of Unknown Words
Chikara HashimotoSadao Kurohashi
ジャーナル フリー

2014 年 9 巻 4 号 p. 712-735


The semantic relations between words are essential for natural language understanding. Toward deeper natural language understanding, we semi-automatically constructed a domain dictionary that represents the domain relations between fundamental Japanese words. Our method does not require a document collection. As a task-based evaluation of the domain dictionary, we categorized blogs by assigning a domain for each word in a blog article and categorizing it as the most dominant domain. Thus, we dynamically estimated the domains of unknown words, (i.e., those not listed in the domain dictionary), resulting in our blog categorization achieving an accuracy of 94.0% (564/600). Moreover, the domain estimation technique for unknown words achieved an accuracy of 76.6% (383/500).

© 2014 The Association for Natural Language Processing
前の記事 次の記事