posted on 2007-01-01, 00:00authored byJuan Caballero, Heng Yin, Zhenkai Liang, Dawn Song
Protocol reverse engineering, the process of extracting the
application-level protocol used by an implementation, without
access to the protocol specification, is important for
many network security applications. Recent work [17] has
proposed protocol reverse engineering by using clustering
on network traces. That kind of approach is limited by the
lack of semantic information on network traces. In this paper
we propose a new approach using program binaries. Our
approach, shadowing, uses dynamic binary analysis and is
based on a unique intuition—the way that an implementation
of the protocol processes the received application data
reveals a wealth of information about the protocol message
format. We have implemented our approach in a system
called Polyglot and evaluated it extensively using real-world
implementations of five different protocols: DNS, HTTP,
IRC, Samba and ICQ.We compare our results with the manually
crafted message format, included in Wireshark, one
of the state-of-the-art protocol analyzers. The differences
we find are small and usually due to different implementations
handling fields in different ways. Finding such differences
between implementations is an added benefit, as they
are important for problems such as fingerprint generation,
fuzzing, and error detection.