Malware analysts often need to search large corpuses of obfuscated binaries
for particular sequences of related instructions. The use of simple tactics, such as dead code insertion and register renaming, prevents the use of conventional, big-data search indexes. Current, state of the art malware detectors are unable to handle the size of the dataset due to their iterative approach to comparing files. Furthermore, current work is also frequently designed to act as a detector and not a search tool. I propose a system that exploits the observation that many data/control-flow relationships between instructions are preserved in the presence of obfuscations. The system will extract chains of flow-dependent instructions from a binary’s Program Dependence Graph (PDG). It will then use a representation of each chain as a key for an index that points to lists of functions (and their corresponding files). Analysts will be able to quickly search for instruction sequences by querying the index.