Limits of Learning-Based Signature Generation with Adversaries

Venkataraman, Shobha; Blum, Avrim; Song, Dawn

doi:10.1184/R1/6468965.v1

file.pdf (654.08 kB)

Limits of Learning-Based Signature Generation with Adversaries

journal contribution

posted on 2008-01-01, 00:00 authored by Shobha Venkataraman, Avrim Blum, Dawn Song

Automatic signature generation is necessary because there may often be little time between the discovery of a vulnerability, and exploits developed to target the vulnerability. Much research effort has focused on patternextraction techniques to generate signatures. These have included techniques that look for a single large invariant substring of the byte sequences, as well as techniques that look for many short invariant substrings. Pattern-extraction techniques are attractive because signatures can be generated and matched efficiently, and earlier work has shown the existence of invariants in exploits. In this paper, we show fundamental limits on the accuracy of pattern-extraction algorithms for signaturegeneration in an adversarial setting. We formulate a framework that allows a unified analysis of these algorithms, and prove lower bounds on the number of mistakes any patternextraction learning algorithm must make under common assumptions, by showing how to adapt results from learning theory. While previous work has targeted specific algorithms, our work generalizes these attacks through theoretical analysis to any algorithm with similar assumptions, not just the techniques developed so far. We also analyze when pattern-extraction algorithms may work, by showing conditions under which these lower bounds are weakened. Our results are applicable to other kinds of signature-generation algorithms as well, those that use properties of the exploit that can be manipulated.

History

Date

2008-01-01

Usage metrics

Keywords

Electrical & Computer Engineering

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Limits of Learning-Based Signature Generation with Adversaries

History

Date

Usage metrics

Categories

Keywords

Licence

Exports