木更津工業高等専門学校紀要
Online ISSN : 2188-921X
Print ISSN : 2188-9201
ISSN-L : 0285-7901
シェルスクリプトを活用した英語語彙統計量算出プログラムの開発
岩崎 洋一
著者情報
研究報告書・技術報告書 フリー

2011 年 44 巻 p. 33-38

詳細
抄録

Recently, a corpus, a large collection of written or spoken language, is widely used in the field of linguistics and language education. The improvement of computer technology makes it possible for English researchers and English teachers to use corpora easily. The author has developed shell script programs, which can work on Linux, to analyze English vocabularies of a corpus and to help English teachers to make teaching materials. The purposes of this study were to develop a shell script program which calculates English vocabulary statistic of a corpus and to check whether the processing speed of the program can be suitable for practical use or not. The computer program developed in this study can calculate the numerical values of a corpus such as file size, tokens, types, type/token ratio (TTR), standardized TTR, mean word length, number of sentences, mean sentence length, number of paragraphs and mean paragraph length. As a result of test runs, it was revealed that the processing time increased in proportion to the size of a corpus, and it was concluded that the shell script program developed in this study can deal with 500,000 word corpus of English.

著者関連情報
© 2011 独立行政法人 国立高等専門学校機構 木更津工業高等専門学校
前の記事 次の記事
feedback
Top