Mariposa: Measuring SMT Instability in Automated Program Verification (Technical Report)
This is a technical report in supplementary to the paper titiled "Mariposa: Measuring SMT Instability in Automated Program Verification" at FMCAD 2023.This report includes extended experiments results not presetned the conference paper.
Program verification has been successfully applied to increasingly large and complex systems. Much of this recent success can be attributed to the automation provided by dispatching verification condition queries viaSMT solvers. However, multiple teams anecdotally report that this style of automated verification is plagued by proof instability, where semantically irrelevant changes to the query can have large effects on the SMT solver’s response.
In this work, we present Mariposa, a tool to detect and quantify instability. To better understand the status quo of instability, we apply Mariposa to a set of 17,043 SMT queries from six existing program verification projects. We discover that SMT solver upgrades often make projects less stable, and that the most recent SMT solver version is unstable on 2.6% of the queries. For individual projects, the unstable ratio can grow to 5.0%. Based on our experimental results, we curate the Mariposa benchmark, which we hope will help measure and incentivize stability improvements in SMT-based program verification