paint-brush
Revisiting Copilot's Weaknesses: A Deep Dive into Security Issues in Code Generationby@gitflow
140 reads

Revisiting Copilot's Weaknesses: A Deep Dive into Security Issues in Code Generation

tldt arrow

Too Long; Didn't Read

The replication study of Copilot's security analysis zoomed in on the diversity of weakness dimensions, revealing that a significant percentage of code suggestions remained vulnerable across various scenarios and languages, emphasizing ongoing challenges in AI-generated code security.
featured image - Revisiting Copilot's Weaknesses: A Deep Dive into Security Issues in Code Generation
What is GitFlow? The Collaborative Git Alternative HackerNoon profile picture

Authors:

(1) Vahid Majdinasab, Department of Computer and Software Engineering Polytechnique Montreal, Canada;

(2) Michael Joshua Bishop, School of Mathematical and Computational Sciences Massey University, New Zealand;

(3) Shawn Rasheed, Information & Communication Technology Group UCOL - Te Pukenga, New Zealand;

(4) Arghavan Moradidakhel, Department of Computer and Software Engineering Polytechnique Montreal, Canada;

(5) Amjed Tahir, School of Mathematical and Computational Sciences Massey University, New Zealand;

(6) Foutse Khomh, Department of Computer and Software Engineering Polytechnique Montreal, Canada.

Abstract and Introduction

Original Study

Replication Scope and Methodology

Results

Discussion

Related Work

Conclusion, Acknowledgments, and References

II. ORIGINAL STUDY

The authors of the original study use Copilot with code prompts to answer these questions: Are Copilot’s suggestions commonly insecure? What is the prevalence of insecure generated code? What factors of the “context” yield generated code that is more or less secure? The original study examines Copilot’s behavior across three dimensions: diversity of weakness, diversity of prompt, and diversity of domain. In this replication, we focus on just the diversity of the weakness dimension. The original study constructs three scenarios for each of “top 25” CWE’s and uses CodeQL or manual inspection to determine security issues present in the generated code. For all axes and languages, 39.33% of the top and 40.73% of the total options were vulnerable. For Python specifically, this number is 37.93% of the top and 36.54% of the total.


This paper is available on arxiv under CC 4.0 license.