Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
An Efficient Method of Determining Field Association Terms of Compound Words
TAKAKO TSUJIMASAO FUKETAKAZUHIRO MORITAJUN-ICHI AOE
Author information
JOURNAL FREE ACCESS

2000 Volume 7 Issue 2 Pages 3-26

Details
Abstract
Although there are many kinds of research about text classification based on term information in the whole text, humans can recognize the field of a text by finding a small number of specific words in it. In this paper, such terms are called a field association (FA) term that can be directly related to the field of a text. It is possible to collect single-word FA terms because the number is finite, but there are some difficulties: how to select useful compound FA terms from a huge number of combinations of single-word FA terms. For FA terms, five association levels are defined and two kinds of ranks based on stability and inheritance are presented. Redundant candidates of compound FA terms can be removed remarkably by using the level and the rank. From the simulation results of 180 fields' Japanese text files, it turns out that the total number 88, 782 of candidates for compound FA terms can be reduced to 8, 405 which is about 9% to the original and that recall and precision are more than 0.77 and 0.90, respectively. From the experimental results of field determination using FA terms for 264 fragments of texts, it is shown that the accuracy by the presented method attains more than 90%, and that is about 30% higher than the case where only single-word FA terms are used.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top