An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code

Tang, Ningzhi; Chen, Meng; Ning, Zheng; Bansal, Aakash; Huang, Yu; McMillan, Collin; Li, Toby Jia-Jun

doi:10.1184/R1/22223533.v1

PLATEAU23_eye_tracker_for_co_pilot.pdf (2.57 MB)

An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code

conference contribution

posted on 2023-03-30, 16:38 authored by Ningzhi Tang, Meng Chen, Zheng Ning, Aakash Bansal, Yu Huang, Collin McMillan, Toby Jia-Jun Li

Recent advances in AI-based code generation tools such as GitHub Copilot show great promise in assisting developers with programming tasks. However, there are few empirical studies that used objective measures to investigate the behavior of programmers when validating and repairing Copilot-generated codes. In this work, we conducted a user study with 9 participants using eye tracking and IDE tracking to characterize how programmers handle errors when using Copilot. We found that developers had greater cognitive effort, but were less frustrated in the editing phase of the code compared to in the understanding and navigation phases. Programmers frequently used prompts to generate code during the repair process and accepted most of the

generated code, yet they scrutinized the prompt and code for validation after accepting the code. Finally, participants found several IDE features such as Run, Debug, and GoToDeclaration helpful for code validation.

Funding

This research was supported in part by an AnalytiXIN Faculty Fellowship, an NVIDIA Academic Hardware Grant, a Google Cloud Research Credit Award, a Google Research Scholar Award, and NSF grants CCF-2211428 and CCF-2100035.

History

Date

2023-02-13

Usage metrics

Keywords

GitHub Copilot code generation programmer behavior eye tracking IDE tracking

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code

Funding

This research was supported in part by an AnalytiXIN Faculty Fellowship, an NVIDIA Academic Hardware Grant, a Google Cloud Research Credit Award, a Google Research Scholar Award, and NSF grants CCF-2211428 and CCF-2100035.

History

Date

Usage metrics

Categories

Keywords

Licence

Exports