The bulletin of the Kanto-koshin-etsu English Language Education Society
Online ISSN : 2433-0841
Print ISSN : 0911-2502
ISSN-L : 0911-2502
Using Statistical Measures to Extract Specialized Vocabulary from a Corpus
Kiyomi CHUJOMasao UTIYAMA
Author information
JOURNAL FREE ACCESS

2004 Volume 18 Pages 99-108

Details
Abstract

Earlier studies have established that the use of frequency of occurrence is effective in extracting specialized vocabulary from a corpus. What would happen if, rather than relying on solely frequency, a range of various statistical tools were used? In this study, eight individual and one 'F_<cum>' combination statistical analyses were evaluated for effectiveness in producing specialized vocabulary by comparing extracted lists to existing specialized vocabulary control lists. It was found that the 'F_<cum>' combination of measures created the most comparable data followed in effectiveness by the Dice coefficient. It was determined that all these measures were effective tools in producing beneficial specialized vocabulary, and that each measure created a unique list with regard to frequency, word length, type of word, and school textbook vocabulary coverage. While the use of frequency alone as a determiner of specialized vocabulary from a corpus is effective, the application of statistical tools provides even greater effectiveness in extracting various types of specialized lists which can be targeted to students' vocabulary or proficiency levels.

Content from these authors
© 2004 Kantokoshinetsu Association of Teachers of English
Previous article Next article
feedback
Top