ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions
Programmers should never fix the same bug twice. Unfortunately this often happens when patches to buggy code are not propagated to all code clones. Unpatched code clones represent latent bugs, and for security-critical problems, latent vulnerabilities, thus are important to detect quickly. In this paper we present ReDeBug, a system for quickly finding unpatched code clones in OS-distribution scale code bases. While there has been previous work on code clone detection, ReDeBug represents a unique design point that uses a quick, syntax-based approach that scales to OS distribution-sized code bases that include code written in many different languages. Compared to previous approaches, ReDeBug may find fewer code clones, but gains scale, speed, reduces the false detection rate, and is language agnostic. We evaluated ReDeBug by checking all code from all packages in the Debian Lenny/Squeeze, Ubuntu Maverick/Oneiric, all Source Forge C and C++ projects, and the Linux kernel for unpatched code clones. ReDeBug processed over 2.1 billion lines of code at 700,000 LoC/min to build a source code database, then found 15,546 unpatched copies of known vulnerable code in currently deployed code by checking 376 Debian/Ubuntu security-related patches in 8 minutes on a commodity desktop machine. We show the real world impact of ReDeBug by confirming 145 real bugs in the latest version of Debian Squeeze packages.