J-NER:大規模言語モデルのための固有表現認識における拡張固有表現階層を考慮したベンチマークデータセット

渋谷 優介; 澁谷 紘人

doi:10.11517/pjsai.JSAI2024.0_4Xin206

Abstract

It is an important aspect of understanding a language model to ascertain whether the model is able to recognize the structure and connections of sentences. Named entity such as place names and person names are one of the main components of language, and research on the recognition of proper expressions in language models is an important theme in understanding language models. Although named entity recognition is also important in large language models, compared to general language models, there is still room for research in areas such as the development of data sets for named entity recognition. Therefore, in this study, we create a new benchmark dataset "J-NER", which includes named entities of training data of large language models and extended named entity. Using this dataset, we evaluate large language models with Gemini Pro, GPT-3.5, and ELYZA, and find that there is variation in accuracy and F1 score. This suggests that J-NER is effective in measuring the named entity recognition ability of large language models; it is expected that we can obtain deep insights into the named entity recognition ability of large language models through using J-NER.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!