Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 2G6-GS-6-03
Conference information

ChatGPT-based adaptive data augmentation for multi-label Japanese text classification in the medical domain
*Tadashi TSUBOTA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Multi-label text classification is a common task type in the medical domain. However, the preparation of the training dataset (annotation) is costly because manual annotations are laborious and require extensive domain-specific knowledge. Here we introduce an automated data augmentation method using ChatGPT, in which new training data are generated according to the ground-truth data (NTCIR-13 MedWeb Japanese corpus). The method is adaptive because it leverages a baseline BERT model fine-tuned with the ground-truth dataset for active filtering of generated training data. The final model trained with the dataset in which the ground truth and augmented data were merged showed a 2.4% improvement in the F1 score compared with the baseline model. The proposed algorithms can help solve multi-label classification problems in the medical domain.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top