Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
Identification of Cybersecurity Specific Content Using Different Language Models
Otgonpurev MendsaikhanHirokazu HasegawaYukiko YamaguchiHajime ShimadaEnkhbold Bataa
著者情報
キーワード: cyber threat, NLP, Text-Classification
ジャーナル フリー

2020 年 28 巻 p. 623-632

詳細
抄録

Given the sheer amount of digital texts publicly available on the Internet, it becomes more challenging for security analysts to identify cyber threat related content. In this research, we proposed to build an autonomous system to identify cyber threat information from publicly available information sources. We examined different language models to utilize as a cybersecurity-specific filter for the proposed system. Using the domain-specific training data, we trained Doc2Vec and BERT models and compared their performance. According to our evaluation, the BERT-based Natural Language Filter is able to identify and classify cybersecurity-specific natural language text with 90% accuracy.

著者関連情報
© 2020 by the Information Processing Society of Japan
前の記事 次の記事
feedback
Top