Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Construction of Domain Dictionary for Fundamental Vocabulary and its Application to Automatic Blog Categorization with the Dynamic Estimation of Unknown Words' Domains
CHIKARA HASHIMOTOSADAO KUROHASHI
Author information
JOURNAL FREE ACCESS

2008 Volume 15 Issue 5 Pages 73-97

Details
Abstract
For natural language understanding, it is essential to reveal semantic relations between. words. To date, only the IS-A relation has been publicly available as a thesaurus. Toward deeper natural language understanding, we semi-automatically constructed the domain dictionary that represents the domain relation between Japanese fundamental words. Our method does not require a document collection. As a task-based evaluation of the domain dictionary, we performed blog categorization, where we assigned a domain for each word in a blog article and categorize it as the most dominant domain. In so doing, we dynamically estimated the domains of unknown words, i.e., those not listed in the domain dictionary. As a result, our blog categorization achieved the accuracy of 94.0% (564/600). Also, the domain estimation technique for unknown words achieved the accuracy of 76.6% (383/500).
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top