Data Augmentation for Low-Resource Languages in Multilingual Dependency Parsing

Jiannan Mao; Chenchen Ding; Hour Kaing; Hideki Tanaka; Masao Utiyama; Tadahiro Matsumoto

doi:10.5715/jnlp.32.219

抄録

UDify (Kondratyuk and Straka 2019) is a multilingual, multi-task parser fine-tuned on mBERT that achieves remarkable performance on high-resource languages. However, on some low-resource languages, its performance saturates early and decreases gradually as training proceeds. To address this issue, this study applies a data augmentation method to improve parsing performance. We conducted experiments on five few-shot and three zero-shot languages to test the effectiveness of this approach. The unlabeled attachment scores were improved on the zero-shot language dependency parsing tasks, with the average score increasing from 55.6% to 59.0%. Meanwhile, dependency parsing tasks in high-resource languages and other Universal Dependencies tasks were almost unaffected. The experimental results demonstrate that the data augmentation method is effective for low-resource languages in multilingual dependency parsing. Furthermore, our experiments confirm that continuously increasing the quantity of synthetic data enhances UDify's performance. This improvement was particularly effective for zero-shot target languages.

著者関連情報

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）