Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
Leaving All Proxy Server Logs to Paragraph Vector
Mamoru MimuraHidema Tanaka
Author information
JOURNAL FREE ACCESS

2018 Volume 26 Pages 804-812

Details
Abstract

Cyberattack techniques continue to evolve every day. Detecting unseen drive-by-download attacks or C&C traffic is a challenging task. Pattern-matching-based techniques and using malicious blacklists are not efficient anymore, because attackers easily change the traffic pattern or infrastructure to avoid detection. Therefore, many behavior-based detection methods have been proposed, which use the immutable characteristic of the traffic. These previous methods, however, focus on the attack technique, and can only detect drive-by-download (DbD) attacks or C&C traffic which have the immutable characteristic. These traditional methods have to devise the feature vectors. This paper proposes a generic detection method, which is independent of attack methods and does not need devising feature vectors. Our method uses Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length texts and classifiers. Our method uses Paragraph Vector to capture the context in proxy server logs. We conducted cross-validation, timeline analysis and cross-dataset validation with multiple datasets. The experimental results show our method can detect unseen DbD attacks and C&C traffic in proxy server logs. The best F-measure achieved 0.97 in the timeline analysis and 0.96 on the other dataset.

Content from these authors
© 2018 by the Information Processing Society of Japan
Previous article Next article
feedback
Top