Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4Xin2-114
Conference information

Annotation of Sequence Labeling with Pre-labeling by Large Language Models
*Kanato ISHIITakuro NIITSUMAYuya TAGUCHIYosuke YAMANOKaori SUGINOHideaki TAMORI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Generating training data incurs significant annotation costs in sequence labeling tasks, such as named entity recognition (NER). One approach to tackle this efficiency issue is to use labeling results from pretraining sequence labeling models as references during annotation. However, most publicly accessible sequence labeling models utilize general label sets, including labels like PERSON, LOCATION, and ORGANIZATION, and these models cannot pre-label domain-specific entities. This study proposes a method to improve the efficiency of generating training data for domain-specific sequence labeling tasks using pre-labeled results obtained from large language models as references during annotation. First, we employ a prompt surrounding the target sentence with domain-specific entities in XML format tags, then apply it to a large language model (LLM). Subsequently, we format the output from the LLM and display them as label candidates on the annotation tool. This approach allows us to provide pre-labeled results even in domains with limited data. We report on the impact of this method on the cost of annotation using this system.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top