Carnegie Mellon University
Browse

Code Modernization Techniques Using Clang-Tidy for C23 Checked Arithmetic

Download (381.44 kB)
thesis
posted on 2025-05-21, 19:37 authored by John SamuelsJohn Samuels

The first major integer safety issue occurred in 1975 with the DATE75 bug, where dates past January 4th 1975 could not be represented using a 12 bit integer. Almost 50 years later, integer safety bugs are still an issue programmers must consider when writing code. The root cause of this issue is related to how computers represent numbers. That is, numbers are represented in binary using a fixed number of bits. In C, when a value is placed in some variable that does not have enough bits to store said value, integer overflow or wraparound occurs. This is essentially a truncation operation and reflects how the underlying hardware operates. Integer overflow is when a signed integer is too small, and wraparound is when an unsigned integer is too small. Although integer wraparound is defined by the C standard, both types of behavior can result in unexpected values and develop into security vulnerabilities. Recently, the C standards committee has come out with a new set of checked arithmetic macros for combating these issues. C23 now has checked arithmetic macros that can be used to ensure that integer operations do not overflow or unintentionally wraparound. This thesis aims to explore the application of these macros, and pro pose a solution to automatically updating older codebases using a compiler-assisted approach.

There are a number of issues related to using the new checked arithmetic macros, and even more so trying to automatically rewrite expressions. When a rewrite needs to be performed, a set of temporary variables need to be defined based on the types involved in the original expression. These types may be non-obvious, since they may be declared elsewhere or implicitly cast at some point. To address this issue, type information is collected from the compiler to ensure that these temporary variables iii reflect the types in the statement itself and the destination. This approach was implemented using the tools and APIs available in the LLVM compiler infrastructure project, and was implemented as a clang-tidy check. Using these tools allowed the clang-tidy check to match on statements that could be rewritten, then suggest a fix that would select the intermediate types based on what the compiler was using.

History

Date

2025-04-29

Degree Type

  • Master's Thesis

Thesis Department

  • Information Networking Institute

Degree Name

  • Master of Science (MS)

Advisor(s)

Patrick Tague

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC