複数サブワード系列を考慮したBiLSTM-CRFモデルを用いた文書からの化合物名抽出

関根 裕人; 浦澤 合; 乾 孝司; 岩倉 友哉

doi:10.11517/pjsai.JSAI2019.0_1N4J902

33rd (2019)

Session ID : 1N4-J-9-02

DOI https://doi.org/10.11517/pjsai.JSAI2019.0_1N4J902

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, 2019

Number : 33

Location : [in Japanese]

Date : June 04, 2019 - June 07, 2019

Adding Multiple Subword Sequences to BiLSTM-CRF Model for Compound Name Extraction

*Hiroto SEKINE, Go URASAWA, Takashi INUI, Tomoya IWAKURA

Author information

Keywords: Named Entity Recognition, Deep Learning, Subword

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

In this paper, we propose a BiLSTM-CRF model for extracting compound names from documents in chemical domain. The proposed model can be taken multiple subword sequences as input in order to obtain sufficient features for long span or unknown tokens. Subword LSTM units with contextual information are introduced in the input layer of the model. We conducted experiments based on CHEMDNER challenge to investigate the effectiveness of the model. As a result, the extraction accuracy outperformed the normal BiLSTM-CRF model, and experimental results on unknown words showed that the proposed method works better.

Corresponding author

Conference information

Register with J-STAGE for free!