Journal of Computer Chemistry, Japan
Online ISSN : 1347-3824
Print ISSN : 1347-1767
ISSN-L : 1347-1767
General Paper
Extraction of Chemical Substance Names from Patent Publications
Rumiko TANAKAShin-ichi NAKAYAMA
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2022 Volume 21 Issue 1 Pages 1-9

Details
Abstract

To effectively utilize the knowledge on chemicals, it is necessary to efficiently extract, organize, and collate the names of core chemical substances and their respective structures, functions, manufacturing methods, chemical reactions, and uses. This activity, indeed, takes time and effort. While extracting the names of these substances from Japanese sentences, it is important to remember that unlike English, Japanese words are not separated by spaces or symbols. Therefore, firstly, one needs to perform a morphological analysis of the chemical names, divide them into distinct words, and group them accordingly. When an additional word comes to be attached to the name of a chemical substance owing to the bonding of unnecessary words, it needs to be removed. In this study, we focus on the character type, arrangement, and context of the names of these chemicals in Japanese sentences. We created a corpus tagged with the chemical names extracted from patent publications and used it as training data material for a machine learning model. Further, we examined the possibility of extracting the chemical names using this method.

Content from these authors
© 2022 Society of Computer Chemistry, Japan
Previous article Next article
feedback
Top