Introduced in this paper is an English learner corpus built for the R & D of an e-Learning system. Analysis and application experiments of the corpus are also shown. The corpus consists of English sentences that were translated from Japanese by Japanese English learners. Each of them translated 300 Japanese sentences into English. Their English proficiencies were measured through TOEIC. Reference sentences, translated by bilinguals, were also collected for automatic evaluation of the translation quality. In the experiments, automatic scores such as BLEU, NIST, WER, PER, METEOR and GTM were used. According to the experimental results, GTM gives the highest correlation, 0.74 for an automatic score and TOEIC. By adding 4 parameters (sentence length, word length of the translation of the English learners, etc.) for the multiple linear regression analysis, the correlation improves to 0.76.
View full abstract