Automatically Inferring the Evolution of Malicious Activity on the Internet
Any type of content formally published in an academic journal, usually following a peer-review process.
Internet-based services routinely contend with a range of malicious activity (e.g., spam, scans, botnets) that can potentially arise from virtually any part of the global Internet infrastructure and that can shift longitudinally over time. In this paper, we develop the first algorithmic techniques to automatically infer regions of the Internet with shifting security characteristics in an online fashion. Conceptually, our key idea is to model the malicious activity on the Internet as a decision tree over the IP address space, and identify the dynamics of the malicious activity by inferring the dynamics of the decision tree. Our evaluations on large corpuses of mail data and botnet data indicate that our algorithms are fast, can keep up with Internet-scale traffic data, and can extract changes in sources of malicious activity substantially better (a factor of 2.5) than approaches based on using predetermined levels of aggregation such as BGP-based network-aware clusters. Our case studies demonstrate our algorithm’s ability to summarize large shifts in malicious activity to a small number of IP regions (by as much as two orders of magnitude), and thus help focus limited operator resources. Using our algorithms, we find that some regions of the Internet are prone to much faster changes than others, such as a set of small and medium-sized hosting providers that are of particular interest to mail operators