Carnegie Mellon University
Browse

Evaluating Static Analysis Alerts with LLMs

Download (116.35 kB)
online resource
posted on 2024-10-08, 17:58 authored by William KlieberWilliam Klieber, Lori FlynnLori Flynn

For safety-critical systems in areas such as defense and medical devices, software assurance is crucial. Analysts can use static analysis tools to evaluate source code without running it, allowing them to identify potential vulnerabilities. Despite their usefulness, the current generation of heuristic static analysis tools require significant manual effort and are prone to producing both false positives (spurious warnings) and false negatives (missed warnings). Recent research from the SEI estimates that these tools can identify up to one candidate error (“weakness”) every three lines of code, and engineers often choose to prioritize fixing the most common and severe errors. However, less common errors can still lead to critical vulnerabilities. For example, a "flooding" attack on a network-based service can overwhelm a target with requests, causing the service to crash. However, neither of the related weaknesses ("improper resource shutdown or release" or "allocation of resources without limits or throttling") is on the 2023 Top 25 Dangerous CWEs list, the Known Exploited Vulnerabilities (KEV) Top 10 list, or the Stubborn Top 25 CWE 2019-23 list. In our research, large language models (LLMs) show promising initial results in adjudicating static analysis alerts and providing rationales for the adjudication, offering possibilities for better vulnerability detection.  In this blog post, we discuss our initial experiments using GPT-4 to evaluate static analysis alerts. This post also explores the limitations of using LLMs in static analysis alert evaluation and opportunities for collaborating with us on future work.

History

Publisher Statement

This material is based upon work funded and supported by the Department of Defense under Contract No. FA8702-15-D-0002 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. The view, opinions, and/or findings contained in this material are those of the author(s) and should not be construed as an official Government position, policy, or decision, unless designated by other documentation. References herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute. This report was prepared for the SEI Administrative Agent AFLCMC/AZS 5 Eglin Street Hanscom AFB, MA 01731-2100. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.

Copyright Statement

Copyright 2024 Carnegie Mellon University.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC