Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Weakly Supervised Named Entity Recognition Using Thesaurus with Hierarchical Structure
Takayoshi ShibaharaIkuya YamadaNoriki NishidaHiroki TeranishiKouji KozakiYuji Matsumoto
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 3 Pages 984-1014

Details
Abstract

Named entity recognition (NER) is a fundamental and important task in natural language processing. However, traditional NER methods require large amounts of supervised data. As a result, they cannot respond to real-world demands flexibly, e.g., for extracting categories with varying granularities depending on the user’s requirements. Weakly supervised NER, which uses contexts in which known words occur as pseudo-data, can satisfy the demands for a variety of categories when combined with a large-scale thesaurus. Previous studies on weakly supervised NER have proposed learning methods that are robust with respect to pseudo-supervised data errors. However, the models created using such learning methods suffer from the side effect of making predictions across boundaries between interested and uninterested categories. To mitigate this shortcoming, we propose a method that utilizes all categories in the thesaurus, including those demanded by users, for pseudo-data generation and clarify the usefulness of the holistic knowledge contained in the thesaurus empirically.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top