Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Term Extraction Based on Occurrence and Concatenation Frequency
HIROSHI NAKAGAWAHIROAKI YUMOTOTATSUNORI MORI
Author information
JOURNAL FREE ACCESS

2003 Volume 10 Issue 1 Pages 27-45

Details
Abstract
In this paper, we propose a new idea of automatically recognizing domain specific terms from monolingual corpus. The majority of domain specific terms are compound nouns that we aim at extracting. Our idea is based on single-noun statistics calculated with single-noun bigrams. Namely we focus on how many nouns adjoin the noun in question to form compound nouns. In addition, we combine this measure and frequency of each compound nouns and single-nouns, which we call FLR method. We experimentally evaluate these methods on NTCIR1 TMREC test collection. As the results, when we take into account less than 1, 400 or more than 12, 000 highest term candidates, FLR method performs best.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top