Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Discernment of Nativeness of English Documents Based on Statistical Hypothesis Testing
Yoichi TomiuraSayaka AokiMasahiro ShibataKensei Yukino
Author information
JOURNAL FREE ACCESS

2009 Volume 16 Issue 1 Pages 1_25-1_46

Details
Abstract
This paper proposes a method to discern the nativeness of English documents with high precision based on Bayes decision and a statistical hypothesis testing. Regarding a document as a sequence of part-of-speeches, the proposed method makes a comparison between probabilities of a document by the statistical language model of native English and by that of non-native English to discern the nativeness of the document. The statistical language model used here is a n-gram model. The n-gram model with a large n can be expected to treat well the difference between the native English and the non-native one and has the potential to discern the nativeness with high precision. However, when we use the n-gram model with a large n, the zero frequency problem and the sparseness problem become acute and we cannot rely on the maximum likelihood estimates of n-gram probabilities. The proposed method estimates the ratio of the probability of the document by the native English language model to that by the non-native English language model using a statistical hypothesis testing. The experimental result shows that the proposed method discerns the nativeness with the precision 92.5%, which is significantly higher than by traditional methods.
Content from these authors
© 2009 The Association for Natural Language Processing
Previous article Next article
feedback
Top