THE BULLETIN OF NATIONAL INSTITUTE of TECHNOLOGY, KISARAZU COLLEGE
Online ISSN : 2188-921X
Print ISSN : 2188-9201
ISSN-L : 0285-7901
Developing a Computer Program for the Calculation of English Vocabulary Statistic Using Shell Script
Youichi IWASAKI
Author information
RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

2011 Volume 44 Pages 33-38

Details
Abstract
Recently, a corpus, a large collection of written or spoken language, is widely used in the field of linguistics and language education. The improvement of computer technology makes it possible for English researchers and English teachers to use corpora easily. The author has developed shell script programs, which can work on Linux, to analyze English vocabularies of a corpus and to help English teachers to make teaching materials. The purposes of this study were to develop a shell script program which calculates English vocabulary statistic of a corpus and to check whether the processing speed of the program can be suitable for practical use or not. The computer program developed in this study can calculate the numerical values of a corpus such as file size, tokens, types, type/token ratio (TTR), standardized TTR, mean word length, number of sentences, mean sentence length, number of paragraphs and mean paragraph length. As a result of test runs, it was revealed that the processing time increased in proportion to the size of a corpus, and it was concluded that the shell script program developed in this study can deal with 500,000 word corpus of English.
Content from these authors
© 2011 National Institute of Technology, Kisarazu College
Previous article Next article
feedback
Top