Carnegie Mellon University
PLATEAU23_eye_tracker_for_co_pilot.pdf (2.57 MB)
Download file

An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code

Download (2.57 MB)
conference contribution
posted on 2023-03-30, 16:38 authored by Ningzhi Tang, Meng Chen, Zheng Ning, Aakash Bansal, Yu Huang, Collin McMillan, Toby Jia-Jun Li

Recent advances in AI-based code generation tools such as GitHub Copilot show great promise in assisting developers with programming tasks. However, there are few empirical studies that used objective measures to investigate the behavior of programmers when validating and repairing Copilot-generated codes. In this work, we conducted a user study with 9 participants using eye tracking and IDE tracking to characterize how programmers handle errors when using Copilot. We found that developers had greater cognitive effort, but were less frustrated in the editing phase of the code compared to in the understanding and navigation phases. Programmers frequently used prompts to generate code during the repair process and accepted most of the

generated code, yet they scrutinized the prompt and code for validation after accepting the code. Finally, participants found several IDE features such as Run, Debug, and GoToDeclaration helpful for code validation.


This research was supported in part by an AnalytiXIN Faculty Fellowship, an NVIDIA Academic Hardware Grant, a Google Cloud Research Credit Award, a Google Research Scholar Award, and NSF grants CCF-2211428 and CCF-2100035.