Home > Research > Publications & Outputs > Model Leeching

Electronic data

  • kwyfzhmspxbkwhghkbdwrhnvfcdgsvmg

    Accepted author manuscript, 460 KB, application/zip

    Available under license: CC BY-NC-SA: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

View graph of relations

Model Leeching: An Extraction Attack Targeting LLMs

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Published

Standard

Model Leeching: An Extraction Attack Targeting LLMs. / Birch, Lewis; Hackett, William; Trawicki, Stefan et al.
2023. Paper presented at Conference on Applied Machine Learning for Information Security, Arlington, Virginia, United States.

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Harvard

Birch, L, Hackett, W, Trawicki, S, Suri, N & Garraghan, P 2023, 'Model Leeching: An Extraction Attack Targeting LLMs', Paper presented at Conference on Applied Machine Learning for Information Security, Arlington, United States, 19/10/23 - 20/10/23.

APA

Birch, L., Hackett, W., Trawicki, S., Suri, N., & Garraghan, P. (2023). Model Leeching: An Extraction Attack Targeting LLMs. Paper presented at Conference on Applied Machine Learning for Information Security, Arlington, Virginia, United States.

Vancouver

Birch L, Hackett W, Trawicki S, Suri N, Garraghan P. Model Leeching: An Extraction Attack Targeting LLMs. 2023. Paper presented at Conference on Applied Machine Learning for Information Security, Arlington, Virginia, United States.

Author

Birch, Lewis ; Hackett, William ; Trawicki, Stefan et al. / Model Leeching : An Extraction Attack Targeting LLMs. Paper presented at Conference on Applied Machine Learning for Information Security, Arlington, Virginia, United States.

Bibtex

@conference{c174d4638d36463491a3dfad393775e5,
title = "Model Leeching: An Extraction Attack Targeting LLMs",
abstract = "Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73% Exact Match (EM) similarity, and SQuAD EM and F1 accuracy scores of 75% and 87%, respectively for only $50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11% increase to attack success rate when applied to ChatGPT-3.5-Turbo. ",
author = "Lewis Birch and William Hackett and Stefan Trawicki and Neeraj Suri and Peter Garraghan",
year = "2023",
month = oct,
day = "20",
language = "English",
note = "Conference on Applied Machine Learning for Information Security, CAMLIS ; Conference date: 19-10-2023 Through 20-10-2023",
url = "https://www.camlis.org/",

}

RIS

TY - CONF

T1 - Model Leeching

T2 - Conference on Applied Machine Learning for Information Security

AU - Birch, Lewis

AU - Hackett, William

AU - Trawicki, Stefan

AU - Suri, Neeraj

AU - Garraghan, Peter

PY - 2023/10/20

Y1 - 2023/10/20

N2 - Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73% Exact Match (EM) similarity, and SQuAD EM and F1 accuracy scores of 75% and 87%, respectively for only $50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11% increase to attack success rate when applied to ChatGPT-3.5-Turbo.

AB - Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73% Exact Match (EM) similarity, and SQuAD EM and F1 accuracy scores of 75% and 87%, respectively for only $50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11% increase to attack success rate when applied to ChatGPT-3.5-Turbo.

M3 - Conference paper

Y2 - 19 October 2023 through 20 October 2023

ER -