Carnegie Mellon University
Browse

GenAI for Code Review of C++ and Java

Download (116.35 kB)
online resource
posted on 2024-11-19, 22:08 authored by David SchulkerDavid Schulker, Vedha AvaliVedha Avali, Genavieve ChickGenavieve Chick

Since the release of OpenAI’s ChatGPT, many companies have been releasing their own versions of large language models (LLMs), which can be used by engineers to improve the process of code development. Although ChatGPT is still the most popular for general use cases, we now have models created specifically for programming, such as GitHub Copilot and Amazon Q Developer. Inspired by Mark Sherma's blog post analyaing the effectiveness of Chat GPT-3.5 for C code analysis, this post details our experiment testing and comparing GPT-3.5 versus 4o for C++ and Java code review. We collected examples from the SEI CERT Secure Coding standards for C++ and Java. Each rule in the standard contains a title, a description, noncompliant code examples, and compliant solutions. We analyzed whether ChatGPT-3.5 and ChatGPT-4o would correctly identify errors in noncompliant code and correctly recognize compliant code as error-free. Overall, we found that both the GPT-3.5 and GPT-4o models are better at identifying mistakes in noncompliant code than they are at confirming correctness of compliant code. They can accurately discover and correct many errors but have a hard time identifying compliant code as such. When comparing GPT-3.5 and GPT-4o, we found that 4o had higher correction rates on noncompliant code and hallucinated less when responding to compliant code. Both GPT 3.5 and GPT-4o were more successful in correcting coding errors in C++ when compared to Java. In categories where errors were often missed by both models, prompt engineering improved results by allowing the LLM to focus on specific issues when providing fixes or suggestions for improvement.

History

Publisher Statement

This material is based upon work funded and supported by the Department of Defense under Contract No. FA8702-15-D-0002 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. The view, opinions, and/or findings contained in this material are those of the author(s) and should not be construed as an official Government position, policy, or decision, unless designated by other documentation. References herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute. This report was prepared for the SEI Administrative Agent AFLCMC/AZS 5 Eglin Street Hanscom AFB, MA 01731-2100. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.

Copyright Statement

Copyright 2024 Carnegie Mellon University.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC