2019 Volume E102.D Issue 9 Pages 1665-1672
Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.