Revisiting Copilot's Weaknesses: A Deep Dive into Security Issues in Code Generation

Authors:

(1) Vahid Majdinasab, Department of Computer and Software Engineering Polytechnique Montreal, Canada;

(2) Michael Joshua Bishop, School of Mathematical and Computational Sciences Massey University, New Zealand;

(3) Shawn Rasheed, Information & Communication Technology Group UCOL - Te Pukenga, New Zealand;

(4) Arghavan Moradidakhel, Department of Computer and Software Engineering Polytechnique Montreal, Canada;

(5) Amjed Tahir, School of Mathematical and Computational Sciences Massey University, New Zealand;

(6) Foutse Khomh, Department of Computer and Software Engineering Polytechnique Montreal, Canada.

Table of Links

Abstract and Introduction

Original Study

Replication Scope and Methodology

Results

Discussion

II. ORIGINAL STUDY

The authors of the original study use Copilot with code prompts to answer these questions: Are Copilot’s suggestions commonly insecure? What is the prevalence of insecure generated code? What factors of the “context” yield generated code that is more or less secure? The original study examines Copilot’s behavior across three dimensions: diversity of weakness, diversity of prompt, and diversity of domain. In this replication, we focus on just the diversity of the weakness dimension. The original study constructs three scenarios for each of “top 25” CWE’s and uses CodeQL or manual inspection to determine security issues present in the generated code. For all axes and languages, 39.33% of the top and 40.73% of the total options were vulnerable. For Python specifically, this number is 37.93% of the top and 36.54% of the total.

This paper is available on arxiv under CC 4.0 license.