ProtoGANist: Protocol Reverse Engineering using Generative Adversarial Networks

2019-06-06T19:09:59Z (GMT) by Carolina Zarate
Many reported vulnerabilities are related to the way that a system accepts, processes, and interprets protocol packets and the information contained therein. Adversaries
can trigger these vulnerabilities by sending specially crafted packets to the system. Typical solutions to this problem include generating packets in accordance with the protocol format, sending them to the system, and observing the resulting behavior on the system. However, these solutions fall apart when dealing with a black box system and black box protocols, because it is unclear how to generate
realistic protocol packets. We present ProtoGANist, a system to model unknown protocol message formats and produce messages similar to the underlying format
using generative machine learning models. Given sample messages from a black-box protocol and a black-box system that uses the protocol, our goal is to learn to produce randomized protocol-compliant messages. The difficulty of this task lies in the complexity of the protocol message format. Message fields' values, lengths, and overall
structure may be defi ned by complex functions that depend on other fields. These dependencies are difficult for existing tools to capture, primarily because they may be a result of several operations performed on the value or length of many fields, such as in checksums. Generative Adversarial Networks (GANs) have been shown to have the ability to learn to generate samples that are similar to the data given to them. GANs traditionally have been used in image processing to create generative models of images. We leverage this capability in a novel way for the purposes of learning the message format of an unknown protocol. Ground-truth sample messages of the unknown protocol are provided to the GAN system. We show that ProtoGANist is
able to identify and learn about complex message format features. We demonstrate that this feature of ProtoGANist is able to outperform other state-of the-art tools in this manner with a separate testing system. This testing system is able to produce protocols with different characteristics to test the complexities that may exist in protocol message formats.