ProtoGANist: Protocol Reverse Engineering using Generative Adversarial Networks

Zarate, Carolina

doi:10.1184/R1/8224286.v1

Zarate_cmu_0041O_10393.pdf (1.26 MB)

ProtoGANist: Protocol Reverse Engineering using Generative Adversarial Networks

thesis

posted on 2019-06-06, 19:09 authored by Carolina ZarateCarolina Zarate

Many reported vulnerabilities are related to the way that a system accepts, processes, and interprets protocol packets and the information contained therein. Adversaries
can trigger these vulnerabilities by sending specially crafted packets to the system. Typical solutions to this problem include generating packets in accordance with the protocol format, sending them to the system, and observing the resulting behavior on the system. However, these solutions fall apart when dealing with a black box system and black box protocols, because it is unclear how to generate
realistic protocol packets. We present ProtoGANist, a system to model unknown protocol message formats and produce messages similar to the underlying format
using generative machine learning models. Given sample messages from a black-box protocol and a black-box system that uses the protocol, our goal is to learn to produce randomized protocol-compliant messages. The difficulty of this task lies in the complexity of the protocol message format. Message fields' values, lengths, and overall
structure may be defined by complex functions that depend on other fields. These dependencies are difficult for existing tools to capture, primarily because they may be a result of several operations performed on the value or length of many fields, such as in checksums. Generative Adversarial Networks (GANs) have been shown to have the ability to learn to generate samples that are similar to the data given to them. GANs traditionally have been used in image processing to create generative models of images. We leverage this capability in a novel way for the purposes of learning the message format of an unknown protocol. Ground-truth sample messages of the unknown protocol are provided to the GAN system. We show that ProtoGANist is
able to identify and learn about complex message format features. We demonstrate that this feature of ProtoGANist is able to outperform other state-of the-art tools in this manner with a separate testing system. This testing system is able to produce protocols with different characteristics to test the complexities that may exist in protocol message formats.

History

Date

2019-05-06

Degree Type

Master's Thesis

Department

Information Networking Institute

Degree Name

Master of Science (MS)

Advisor(s)

Vyas Sekar Giulia Fanti

Usage metrics

Keywords

Computer Security deep learning fuzzing Generative Adversarial Networks reverse engineering protocols

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

ProtoGANist: Protocol Reverse Engineering using Generative Adversarial Networks

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports