Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Multi-label text classification is a common task type in the medical domain. However, the preparation of the training dataset (annotation) is costly because manual annotations are laborious and require extensive domain-specific knowledge. Here we introduce an automated data augmentation method using ChatGPT, in which new training data are generated according to the ground-truth data (NTCIR-13 MedWeb Japanese corpus). The method is adaptive because it leverages a baseline BERT model fine-tuned with the ground-truth dataset for active filtering of generated training data. The final model trained with the dataset in which the ground truth and augmented data were merged showed a 2.4% improvement in the F1 score compared with the baseline model. The proposed algorithms can help solve multi-label classification problems in the medical domain.