2020 年 24 巻 4 号 p. 557-567
An approach to N-best hypotheses re-ranking using a sequence-labeling model is applied to resolve the data deficiency problem in Grammatical Error Correction (GEC). Multiple candidate sentences are generated using a Neural Machine Translation (NMT) model; thereafter, these sentences are re-ranked via a stacked Transformer following a Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Field (CRF). Correlations within the sentences are extracted using the sequence-labeling model based on the Transformer, which is particularly suitable for long sentences. Meanwhile, the knowledge from a large amount of unlabeled data is acquired through the pre-trained structure. Thus, completely revised sentences are adopted instead of partially modified sentences. Compared with conventional NMT, experiments on the NUCLE and FCE datasets demonstrate that the model improves the F0.5 score by 8.22% and 2.09%, respectively. As an advantage, the proposed re-ranking method has the advantage of only requires a small set of easily computed features that do not need linguistic inputs.
この記事は最新の被引用情報を取得できません。