シソーラスの階層的構造を利用した弱教師あり固有表現抽出

芝原 隆善; 山田 育矢; 西田 典起; 寺西 裕紀; 古崎 晃司; 松本 裕治

doi:10.5715/jnlp.31.984

Abstract

Named entity recognition (NER) is a fundamental and important task in natural language processing. However, traditional NER methods require large amounts of supervised data. As a result, they cannot respond to real-world demands flexibly, e.g., for extracting categories with varying granularities depending on the user’s requirements. Weakly supervised NER, which uses contexts in which known words occur as pseudo-data, can satisfy the demands for a variety of categories when combined with a large-scale thesaurus. Previous studies on weakly supervised NER have proposed learning methods that are robust with respect to pseudo-supervised data errors. However, the models created using such learning methods suffer from the side effect of making predictions across boundaries between interested and uninterested categories. To mitigate this shortcoming, we propose a method that utilizes all categories in the thesaurus, including those demanded by users, for pseudo-data generation and clarify the usefulness of the holistic knowledge contained in the thesaurus empirically.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!