Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
Recognition textual entailment (RTE) is an important technology but a research challenge due to the large number of inappropriate training labels in the data set. In this report, we propose Active Clean, which uses active learning (AL) to detect inappropriate labels. The method improves performance by manually assigning correct labels to the selected small amount of data, and then repeating the re-training process. A sampling survey of the JSNLI dataset used in this study showed that about 10% of the labels were incorrect. These mislabeled data were examined using Active Clean and the majority of them were estimated to be inappropriate. The RTE model built by excluding these from the training data improved the average prediction performance by 7.8% compared to the regular training model for test data with confirmed correct labels. This indicates that Active Clean is effective in identifying with many inappropriate labels and has the potential to build more robust models.