An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code
Recent advances in AI-based code generation tools such as GitHub Copilot show great promise in assisting developers with programming tasks. However, there are few empirical studies that used objective measures to investigate the behavior of programmers when validating and repairing Copilot-generated codes. In this work, we conducted a user study with 9 participants using eye tracking and IDE tracking to characterize how programmers handle errors when using Copilot. We found that developers had greater cognitive effort, but were less frustrated in the editing phase of the code compared to in the understanding and navigation phases. Programmers frequently used prompts to generate code during the repair process and accepted most of the
generated code, yet they scrutinized the prompt and code for validation after accepting the code. Finally, participants found several IDE features such as Run, Debug, and GoToDeclaration helpful for code validation.