Carnegie Mellon University
Browse

Automated API Refactoring for Evolving Codebases

Download (2.77 MB)
thesis
posted on 2025-10-24, 18:23 authored by Daniel RamosDaniel Ramos
<p dir="ltr">Modern software development depends heavily on third-party libraries and frameworks, which expose their functionality through APIs and bring substantial productivity gains. However, as libraries evolve to meet new technical or market demands, clients must often adapt their code to accommodate breaking changes or even newer libraries. This form of software maintenance, known as API refactoring, is a time-consuming and error prone task, which has led to significant interest in automating it. A common approach to automating API refactoring is to mine historical data from client repositories to extract match-replace rules. However, these approaches are limited by the availability of high-quality examples: many clients do not refactor in public, and those that do leave insufficient traces to learn from. </p><p dir="ltr">This thesis presents a set of alternative methods for learning API migration rules without requiring large scale mining of client code. Instead, we explore three complementary sources of information: documentation, the API development process, and natural language. First, we use API documentation to infer mappings between old and new APIs, which guide the synthesis of migration scripts. Second, we extract migration knowledge from the evolution of the library itself, especially from pull requests that introduce breaking changes and update internal tests. Finally, we show that large language models trained on natural language artifacts can be used to generate migration examples, which are then validated and generalized into reusable scripts. We operationalize these ideas in four refactoring tools, each targeting a different aspect of the problem. These tools combine program synthesis with machine learning to synthesize and apply migrations automatically. We evaluated our techniques in real-world Python libraries and synthetic benchmarks, showing that it is possible to automate migration effectively using only indirect sources of information, without requiring curated datasets or repository mining.</p>

History

Date

2025-08-20

Degree Type

  • Dissertation

Thesis Department

  • Software and Societal Systems (S3D)

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Claire Le Goues Ruben Martins Vasco Manquinho

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC