複数の補助教師データを用いた固有表現抽出の学習手法

渡邊 大貴; 市川 智也; 田村 晃裕; 岩倉 友哉; 馬 春鵬; 加藤 恒夫

doi:10.5715/jnlp.30.507

Abstract

Named entity recognition (NER) is one of the core technologies for knowledge acquisition from text and has been used for knowledge extraction of chemicals, medicine, and so on. As one of the NER improvement approaches, multi-task learning that learns a model from multiple training data has been used. Among multi-task learning, an auxiliary learning method, which uses training data of an auxiliary task for improving its target task, has shown higher NER performance than conventional multi-task learning for improving all the tasks simultaneously. The conventional auxiliary learning method uses only one auxiliary training dataset. We propose Multiple Utilization of NER Corpora Helpful for Auxiliary BLESsing (MUNCHABLES). MUNCHABLES utilizes multiple training datasets as auxiliary training data by the following methods: the first one is to fine-tune the NER model of the target task by sequentially performing auxiliary learning for each auxiliary training dataset, and the other is to use all training datasets in one auxiliary learning. We evaluate MUNCHABLES on eight chemical/biomedical/scientific domain NER tasks, where seven training datasets are used as auxiliary training data. The experimental results show that our methods achieve higher micro and macro average F1 scores than a conventional auxiliary learning method using one auxiliary training dataset and conventional multi-task learning method. Furthermore, our method achieves the highest F1 score on the s800 dataset.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!