Behavioral Modeling of Botnet Populations Viewed through Internet Protocol Address Space
A botnet is a collection of computers infected by a shared set of malicious software, that maintain communications to a single human administrator or small organized group. Botnets are indirectly observable populations; cyber-analysts often measure a botnet’s threat in terms of its size, but size is derived from a count of the observable network touchpoints through which infected machines communicate. Activity is often a count of packets or connection attempts, representing logins to command and control servers, spam messages sent, peer-to-peer communications, or other discrete network behavior. Front line analysts use sandbox testing of a botnet’s malicious software to discover signatures for detecting an infected computer and shutting it down, but there is less focus on modeling the botnet population as a collection of machines obscured by the kaleidoscope view of Internet Protocol (IP) address space. This research presents a Bayesian model for generic modeling of a botnet due to its observable activity across a network. A generation-allocation model is proposed, that separates observable network activity at time t into the counts yt generated by the malicious software, and the network’s allocation of these counts among available IP addresses. As a first step, the framework outlines how to develop a directly observable behavioral model informed by sandbox tests and day-to-day user activity, and then how to use this model as a basis for population estimation in settings using proxies or Network Address Translation (NAT) in which only the aggregate sum of all machine activity is observed. The model is explored via a case study using the Conficker-C botnet that emerged in March of 2009.