含意関係認識におけるActive Learningを用いた不適切なラベルへの対策

松帆 愛; 彌冨 仁

doi:10.11517/pjsai.JSAI2023.0_3A1GS604

Abstract

Recognition textual entailment (RTE) is an important technology but a research challenge due to the large number of inappropriate training labels in the data set. In this report, we propose Active Clean, which uses active learning (AL) to detect inappropriate labels. The method improves performance by manually assigning correct labels to the selected small amount of data, and then repeating the re-training process. A sampling survey of the JSNLI dataset used in this study showed that about 10% of the labels were incorrect. These mislabeled data were examined using Active Clean and the majority of them were estimated to be inappropriate. The RTE model built by excluding these from the training data improved the average prediction performance by 7.8% compared to the regular training model for test data with confirmed correct labels. This indicates that Active Clean is effective in identifying with many inappropriate labels and has the potential to build more robust models.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!