Limits of Learning-Based Signature Generation with Adversaries
journal contributionposted on 01.01.2008, 00:00 by Shobha Venkataraman, Avrim Blum, Dawn Song
Automatic signature generation is necessary because there may often be little time between the discovery of a vulnerability, and exploits developed to target the vulnerability. Much research effort has focused on patternextraction techniques to generate signatures. These have included techniques that look for a single large invariant substring of the byte sequences, as well as techniques that look for many short invariant substrings. Pattern-extraction techniques are attractive because signatures can be generated and matched efficiently, and earlier work has shown the existence of invariants in exploits. In this paper, we show fundamental limits on the accuracy of pattern-extraction algorithms for signaturegeneration in an adversarial setting. We formulate a framework that allows a unified analysis of these algorithms, and prove lower bounds on the number of mistakes any patternextraction learning algorithm must make under common assumptions, by showing how to adapt results from learning theory. While previous work has targeted specific algorithms, our work generalizes these attacks through theoretical analysis to any algorithm with similar assumptions, not just the techniques developed so far. We also analyze when pattern-extraction algorithms may work, by showing conditions under which these lower bounds are weakened. Our results are applicable to other kinds of signature-generation algorithms as well, those that use properties of the exploit that can be manipulated.