Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Generating training data incurs significant annotation costs in sequence labeling tasks, such as named entity recognition (NER). One approach to tackle this efficiency issue is to use labeling results from pretraining sequence labeling models as references during annotation. However, most publicly accessible sequence labeling models utilize general label sets, including labels like PERSON, LOCATION, and ORGANIZATION, and these models cannot pre-label domain-specific entities. This study proposes a method to improve the efficiency of generating training data for domain-specific sequence labeling tasks using pre-labeled results obtained from large language models as references during annotation. First, we employ a prompt surrounding the target sentence with domain-specific entities in XML format tags, then apply it to a large language model (LLM). Subsequently, we format the output from the LLM and display them as label candidates on the annotation tool. This approach allows us to provide pre-labeled results even in domains with limited data. We report on the impact of this method on the cost of annotation using this system.