特許公開公報文章からの化学物質名の抽出

田中 るみ子; 中山 伸一

doi:10.2477/jccj.2021-0047

Abstract

To effectively utilize the knowledge on chemicals, it is necessary to efficiently extract, organize, and collate the names of core chemical substances and their respective structures, functions, manufacturing methods, chemical reactions, and uses. This activity, indeed, takes time and effort. While extracting the names of these substances from Japanese sentences, it is important to remember that unlike English, Japanese words are not separated by spaces or symbols. Therefore, firstly, one needs to perform a morphological analysis of the chemical names, divide them into distinct words, and group them accordingly. When an additional word comes to be attached to the name of a chemical substance owing to the bonding of unnecessary words, it needs to be removed. In this study, we focus on the character type, arrangement, and context of the names of these chemicals in Japanese sentences. We created a corpus tagged with the chemical names extracted from patent publications and used it as training data material for a machine learning model. Further, we examined the possibility of extracting the chemical names using this method.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!