シェルスクリプトを活用した英語語彙統計量算出プログラムの開発

岩崎 洋一

doi:10.19025/bnitk.44.0_33

Abstract

Recently, a corpus, a large collection of written or spoken language, is widely used in the field of linguistics and language education. The improvement of computer technology makes it possible for English researchers and English teachers to use corpora easily. The author has developed shell script programs, which can work on Linux, to analyze English vocabularies of a corpus and to help English teachers to make teaching materials. The purposes of this study were to develop a shell script program which calculates English vocabulary statistic of a corpus and to check whether the processing speed of the program can be suitable for practical use or not. The computer program developed in this study can calculate the numerical values of a corpus such as file size, tokens, types, type/token ratio (TTR), standardized TTR, mean word length, number of sentences, mean sentence length, number of paragraphs and mean paragraph length. As a result of test runs, it was revealed that the processing time increased in proportion to the size of a corpus, and it was concluded that the shell script program developed in this study can deal with 500,000 word corpus of English.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!