Modern code generation tools use AI models, particularly Large Language Models (LLMs), to generate functional and complete code. While such tools are becoming popular and widely available for developers, using these tools is often accompanied by security challenges, leading to insecure code merging into the code base. It is therefore important to evaluate the quality of the generated code, particularly in terms of security. A study looked at the case of GitHub Copilot.
Therefore, it is important to assess the quality of the generated code, especially in terms of its security. Researchers have recently explored various aspects of code generation tools, including security.
However, many open questions about the security of the generated code require further investigation, especially the security issues of automatically generated code in the wild. To this end, we conducted an empirical study by analyzing the security weaknesses in code snippets generated by GitHub Copilot that are found as part of publicly available projects hosted on GitHub.
GitHub describes Copilot as the AI equivalent of “pair programming,” in which two developers work together on a single computer. The idea is that one developer can contribute new ideas or spot problems that the other developer might have missed, even if it requires more man hours.
In practice, however, Copilot is more of a time-saving utility tool, integrating resources that developers would otherwise have to seek elsewhere. As users enter data into Copilot, the tool suggests code snippets to add with the click of a button. This way, they don’t have to spend time searching through API documentation or searching for code examples on specialized sites.
In the study titled Security Weaknesses of Copilot Generated Code in GitHub, researchers conducted an empirical study by analyzing security weaknesses in code snippets generated by GitHub Copilot that are part of public projects hosted on GitHub. The goal is to investigate the types of security issues and their magnitude in real-world scenarios (rather than elaborate scenarios).
To this end, they identified 435 code snippets generated by GitHub Copilot from publicly available projects. They then performed an in-depth security analysis to identify Common Weakness Enumeration (CWE) instances in these code snippets.