2013 Volume 21 Issue 3 Pages 539-550
Web-based malware attacks have become one of the most serious threats that need to be addressed urgently. Several approaches that have attracted attention as promising ways of detecting such malware include employing one of several blacklists. However, these conventional approaches often fail to detect new attacks owing to the versatility of malicious websites. Thus, it is difficult to maintain up-to-date blacklists with information for new malicious websites. To tackle this problem, this paper proposes a new scheme for detecting malicious websites using the characteristics of IP addresses. Our approach leverages the empirical observation that IP addresses are more stable than other metrics such as URLs and DNS records. While the strings that form URLs or DNS records are highly variable, IP addresses are less variable, i.e., IPv4 address space is mapped onto 4-byte strings. In this paper, a lightweight and scalable detection scheme that is based on machine learning techniques is developed and evaluated. The aim of this study is not to provide a single solution that effectively detects web-based malware but to develop a technique that compensates the drawbacks of existing approaches. The effectiveness of our approach is validated by using real IP address data from existing blacklists and real traffic data on a campus network. The results demonstrate that our scheme can expand the coverage/accuracy of existing blacklists and also detect unknown malicious websites that are not covered by conventional approaches.