Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Computer Networks and Broadcasting
Web Tracking Site Detection Based on Temporal Link Analysis and Automatic Blacklist Generation
Akira YamadaMasanori HaraYutaka Miyake
Author information
JOURNAL FREE ACCESS

2011 Volume 6 Issue 2 Pages 560-571

Details
Abstract

Web tracking sites or Web bugs are potential but serious threats to users' privacy during Web browsing. Web sites and their associated advertising sites surreptitiously gather the profiles of visitors and possibly abuse or improperly expose them, even if visitors are unaware their profiles are being utilized. In order to prevent such sites in a corporate network, most companies employ filters that rely on blacklists, but these lists are insufficient. In this paper, we propose Web tracking sites detection and blacklist generation based on temporal link analysis. Our proposal analyzes traffic at the network gateway so that it can monitor all tracking sites in the administrative network. The proposed algorithm constructs a graph between sites and their visited time in order to characterize each site. Then, the system classifies suspicious sites using machine learning. We confirm that public black lists contain at most 22-70% of the known tracking sites respectively. The machine learning can identify the blacklisted sites with true positive rate, 62-73%, which is more accurate than any single blacklist. Although the learning algorithm falsely identified 15% of unlisted sites, 96% of these are verified to be unknown tracking sites by means of a manual labeling. These unknown tracking sites can serve as good candidates for an entry of a new backlist.

Content from these authors
© 2011 Information Processing Society of Japan
Previous article Next article
feedback
Top