人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
日本語学習者の作文自動誤り訂正のための語学学習SNSの添削ログからの知識獲得
水本 智也小町 守永田 昌明松本 裕治
著者情報
ジャーナル フリー

2013 年 28 巻 5 号 p. 420-432

詳細
抄録

Recently, natural language processing research has begun to pay attention to second language learning. However, it is not easy to acquire a large-scale learners' corpus, which is important for a research for second language learning by natural language processing. We present an attempt to extract a large-scale Japanese learners' corpus from the revision log of a language learning social network service.This corpus is easy to obtain in large-scale, covers a wide variety of topics and styles, and can be a great source of knowledge for both language learners and instructors. We also demonstrate that the extracted learners' corpus of Japanese as a second language can be used as training data for learners' error correction using a statistical machine translation approach.We evaluate different granularities of tokenization to alleviate the problem of word segmentation errors caused by erroneous input from language learners.We propose a character-based SMT approach to alleviate the problem of erroneous input from language learners.Experimental results show that the character-based model outperforms the word-based model when corpus size is small and test data is written by the learners whose L1 is English.

著者関連情報
© 人工知能学会 2013
前の記事 次の記事
feedback
Top