Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
Identification of Cybersecurity Specific Content Using Different Language Models
Otgonpurev MendsaikhanHirokazu HasegawaYukiko YamaguchiHajime ShimadaEnkhbold Bataa
Author information

2020 Volume 28 Pages 623-632


Given the sheer amount of digital texts publicly available on the Internet, it becomes more challenging for security analysts to identify cyber threat related content. In this research, we proposed to build an autonomous system to identify cyber threat information from publicly available information sources. We examined different language models to utilize as a cybersecurity-specific filter for the proposed system. Using the domain-specific training data, we trained Doc2Vec and BERT models and compared their performance. According to our evaluation, the BERT-based Natural Language Filter is able to identify and classify cybersecurity-specific natural language text with 90% accuracy.

Content from these authors
© 2020 by the Information Processing Society of Japan
Previous article Next article