Scan Detection on Very Large Networks Using Logistic Regression Modeling
journal contributionposted on 01.01.2005 by Carrie Gates, Joshua J. McNutt, Joseph B. Kadane, Marc I. Kellner
Any type of content formally published in an academic journal, usually following a peer-review process.
Scanning activity is a common activity on the Internet today, representing malicious activity such as information gathering by a motivated adversary or automated tools searching for vulnerable hosts (e.g., worms). Many scan detection techniques have been developed; however, their focus has been on smaller networks where packet-level information is available, or where internal characteristics of the network are known. For large networks, such as those of ISPs, large corporations or government organizations, this information might not be available. This paper presents a model of scans that can be used given only unidirectional flow data. The model uses a Bayesian logistic regression, which was developed using a combination of expert opinion and manually-classified training data. It is shown to have a detection rate of 95.5% with a false positive rate of 0.4% overall when tested against a set of 300 TCP events.