Revisiting Copilot's Weaknesses: A Deep Dive into Security Issues in Code Generation

Written by gitflow | Published 2024/05/28
Tech Story Tags: github-copilot | ai-code-generation | github-copilot-security | ai-code-security | code-generation-tools | copilot-replication-study | secure-coding-practices | code-vulnerability-analysis

TLDR The replication study of Copilot's security analysis zoomed in on the diversity of weakness dimensions, revealing that a significant percentage of code suggestions remained vulnerable across various scenarios and languages, emphasizing ongoing challenges in AI-generated code security.via the TL;DR App

Authors:

(1) Vahid Majdinasab, Department of Computer and Software Engineering Polytechnique Montreal, Canada;

(2) Michael Joshua Bishop, School of Mathematical and Computational Sciences Massey University, New Zealand;

(3) Shawn Rasheed, Information & Communication Technology Group UCOL - Te Pukenga, New Zealand;

(4) Arghavan Moradidakhel, Department of Computer and Software Engineering Polytechnique Montreal, Canada;

(5) Amjed Tahir, School of Mathematical and Computational Sciences Massey University, New Zealand;

(6) Foutse Khomh, Department of Computer and Software Engineering Polytechnique Montreal, Canada.

Table of Links

Abstract and Introduction

Original Study

Replication Scope and Methodology

Results

Discussion

Related Work

Conclusion, Acknowledgments, and References

II. ORIGINAL STUDY

The authors of the original study use Copilot with code prompts to answer these questions: Are Copilot’s suggestions commonly insecure? What is the prevalence of insecure generated code? What factors of the “context” yield generated code that is more or less secure? The original study examines Copilot’s behavior across three dimensions: diversity of weakness, diversity of prompt, and diversity of domain. In this replication, we focus on just the diversity of the weakness dimension. The original study constructs three scenarios for each of “top 25” CWE’s and uses CodeQL or manual inspection to determine security issues present in the generated code. For all axes and languages, 39.33% of the top and 40.73% of the total options were vulnerable. For Python specifically, this number is 37.93% of the top and 36.54% of the total.

This paper is available on arxiv under CC 4.0 license.


Written by gitflow | A branching model for Git, created by Vincent Driessen. #1 blog dedicated exclusively to forwarding the GitFlow agenda.
Published by HackerNoon on 2024/05/28