大規模言語モデルによる事前ラベリングを活用した系列ラベリングのアノテーション

石井 奏人; 新妻 巧朗; 田口 雄哉; 山野 陽祐; 杉野 かおり; 田森 秀明

doi:10.11517/pjsai.JSAI2024.0_4Xin2114

Abstract

Generating training data incurs significant annotation costs in sequence labeling tasks, such as named entity recognition (NER). One approach to tackle this efficiency issue is to use labeling results from pretraining sequence labeling models as references during annotation. However, most publicly accessible sequence labeling models utilize general label sets, including labels like PERSON, LOCATION, and ORGANIZATION, and these models cannot pre-label domain-specific entities. This study proposes a method to improve the efficiency of generating training data for domain-specific sequence labeling tasks using pre-labeled results obtained from large language models as references during annotation. First, we employ a prompt surrounding the target sentence with domain-specific entities in XML format tags, then apply it to a large language model (LLM). Subsequently, we format the output from the LLM and display them as label candidates on the annotation tool. This approach allows us to provide pre-labeled results even in domains with limited data. We report on the impact of this method on the cost of annotation using this system.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!