2010 Volume 20 Issue 1 Pages 15-31
In this paper, we propose a method for automatic term recognition (ATR) which is using the statistical differences of relative frequencies of terms in target domain corpus and in others. The target terms more frequently appear in target domain corpus than in other domain corpus. Utilizing such characteristics will lead to the improvement of extraction performance. Most of the ATR methods proposed so far only use the target domain corpus and do not take such characteristics into account. For the extraction experiment, we used the abstracts of the Women's Studies International Forum as a target domain corpus and those of academic journals of 39 domains as non-target domain corpus. The extraction performance was examined and we found that our method outperformed the existing ATR methods. We confirmed that it is possible to decrease the size of the other domain corpus by the experiments which used random journals out of 39 domains. As a result, we found that we used some corpus consists of journals which is similar to target domain is almost as high extraction performance as the corpus consists of 39 journals.