統計的指標を利用した特徴語抽出に関する研究

中條 清美; 内山 将夫

doi:10.20806/katejo.18.0_99

Abstract

Earlier studies have established that the use of frequency of occurrence is effective in extracting specialized vocabulary from a corpus. What would happen if, rather than relying on solely frequency, a range of various statistical tools were used? In this study, eight individual and one 'F_<cum>' combination statistical analyses were evaluated for effectiveness in producing specialized vocabulary by comparing extracted lists to existing specialized vocabulary control lists. It was found that the 'F_<cum>' combination of measures created the most comparable data followed in effectiveness by the Dice coefficient. It was determined that all these measures were effective tools in producing beneficial specialized vocabulary, and that each measure created a unique list with regard to frequency, word length, type of word, and school textbook vocabulary coverage. While the use of frequency alone as a determiner of specialized vocabulary from a corpus is effective, the application of statistical tools provides even greater effectiveness in extracting various types of specialized lists which can be targeted to students' vocabulary or proficiency levels.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!