Carnegie Mellon University
Browse
- No file added yet -

Attacking, Defending, and Evaluating Machine-Learning-Based Raw-Binary Malware Detectors

Download (17.23 MB)
thesis
posted on 2024-08-16, 17:53 authored by Keane LucasKeane Lucas

Machine learning (ML) models have shown promise in classifying raw executable files (binaries) as malicious or benign with high accuracy. This has led to the increasing influence of ML-based classification methods in academic and real-world malware detection, a critical component of cybersecurity. This thesis examines and improves the reliability of these ML-based malware detectors. First, we propose an attack that interweaves binary-diversification techniques and optimization frameworks to mislead such malware detectors while preserving the transformed binaries’ functionality. Unlike prior attacks, ours manipulates instructions that are a functional part of the binary, which makes defending against the attack particularly challenging. We then investigate the effectiveness of using adversarial training methods to create malware classification models that are more robust to our attacks. To make adversarial training practical for raw-binary malware detectors, we significantly increase the efficiency and scale of attack creation. In the best case, we reduce one of our most potent attack’s success rate from 90% to 5% and show that training with some types of attacks can increase robustness to other types of attacks. We then propose to accelerate the training and evaluation of robust malware detectors by introducing fast training augmentation and several proxy measures that can quickly indicate increased robustness to more computationally expensive attacks. We will then show that this quick robustness estimation allows us to find robust malware detectors while executing fewer expensive attacks. Finally, we investigate the behavior of ML-based malware detectors by analyzing the similarity of their explanations to explanations of YARA rules created by human experts. Thesis statement: Machine-learning-based (ML-based) raw binary malware detectors can be fooled by adversarially modified binaries. These detectors can also be made more robust via computationally expensive adversarial training. A faster training augmentation, combined with new methods for estimating robustness, can make the detectors even more robust and quicker to train. 

History

Date

2024-07-02

Degree Type

  • Dissertation

Department

  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Lujo Bauer

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC