Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
A Comparative Study of Automatic Extraction of Collocations from Corpora: Mutual Information vs. Cost Criteria
Kenji KitaYasuhiko KatoTakashi OmotoYoneo Yano
Author information

1994 Volume 1 Issue 1 Pages 21-33


While corpus-based studies are now becoming a new methodology in natural languageprocessing, second language learning offers one interesting potential application. In this paper, we are primarily concerned with the acquisition of collocational knowledge from corpora for use in language learning. First we discuss the importance of collocational knowledge in second language learning, and then take up two measures, mutual information and cost criteria, for automatically identifying or extractingcollocations from corpora. Comparative experiments are made between the two measures using both Japanese and English corpora. In our experiments, the cost criteria measure proved more effective in extracting interesting collocations such as fundamental idiomatic expressions and phrases.

Information related to the author
© The Association for Natural Language Processing
Previous article Next article