Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation

Computing and Communications

Electronic data

pdf
Accepted author manuscript, 3.42 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Keywords

Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence

View graph of relations

Research output: Working paper › Preprint

Published

Standard

Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation. / Tang, Xunzhu; Chen, Zhenghan; Kim, Kisub et al.
2023.

Research output: Working paper › Preprint

Bibtex

@techreport{b9773baac812439f800d10c6fd09aa8a,

title = "Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation",

abstract = "In the face of growing vulnerabilities found in open-source software, the need to identify {discreet} security patches has become paramount. The lack of consistency in how software providers handle maintenance often leads to the release of security patches without comprehensive advisories, leaving users vulnerable to unaddressed security risks. To address this pressing issue, we introduce a novel security patch detection system, LLMDA, which capitalizes on Large Language Models (LLMs) and code-text alignment methodologies for patch review, data enhancement, and feature combination. Within LLMDA, we initially utilize LLMs for examining patches and expanding data of PatchDB and SPI-DB, two security patch datasets from recent literature. We then use labeled instructions to direct our LLMDA, differentiating patches based on security relevance. Following this, we apply a PTFormer to merge patches with code, formulating hybrid attributes that encompass both the innate details and the interconnections between the patches and the code. This distinctive combination method allows our system to capture more insights from the combined context of patches and code, hence improving detection precision. Finally, we devise a probabilistic batch contrastive learning mechanism within batches to augment the capability of the our LLMDA in discerning security patches. The results reveal that LLMDA significantly surpasses the start of the art techniques in detecting security patches, underscoring its promise in fortifying software maintenance.",

keywords = "Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence",

author = "Xunzhu Tang and Zhenghan Chen and Kisub Kim and Haoye Tian and Saad Ezzini and Jacques Klein",

year = "2023",

month = dec,

day = "1",

language = "English",

type = "WorkingPaper",

}

RIS

TY - UNPB

T1 - Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation

AU - Tang, Xunzhu

AU - Chen, Zhenghan

AU - Kim, Kisub

AU - Tian, Haoye

AU - Ezzini, Saad

AU - Klein, Jacques

PY - 2023/12/1

Y1 - 2023/12/1

N2 - In the face of growing vulnerabilities found in open-source software, the need to identify {discreet} security patches has become paramount. The lack of consistency in how software providers handle maintenance often leads to the release of security patches without comprehensive advisories, leaving users vulnerable to unaddressed security risks. To address this pressing issue, we introduce a novel security patch detection system, LLMDA, which capitalizes on Large Language Models (LLMs) and code-text alignment methodologies for patch review, data enhancement, and feature combination. Within LLMDA, we initially utilize LLMs for examining patches and expanding data of PatchDB and SPI-DB, two security patch datasets from recent literature. We then use labeled instructions to direct our LLMDA, differentiating patches based on security relevance. Following this, we apply a PTFormer to merge patches with code, formulating hybrid attributes that encompass both the innate details and the interconnections between the patches and the code. This distinctive combination method allows our system to capture more insights from the combined context of patches and code, hence improving detection precision. Finally, we devise a probabilistic batch contrastive learning mechanism within batches to augment the capability of the our LLMDA in discerning security patches. The results reveal that LLMDA significantly surpasses the start of the art techniques in detecting security patches, underscoring its promise in fortifying software maintenance.

AB - In the face of growing vulnerabilities found in open-source software, the need to identify {discreet} security patches has become paramount. The lack of consistency in how software providers handle maintenance often leads to the release of security patches without comprehensive advisories, leaving users vulnerable to unaddressed security risks. To address this pressing issue, we introduce a novel security patch detection system, LLMDA, which capitalizes on Large Language Models (LLMs) and code-text alignment methodologies for patch review, data enhancement, and feature combination. Within LLMDA, we initially utilize LLMs for examining patches and expanding data of PatchDB and SPI-DB, two security patch datasets from recent literature. We then use labeled instructions to direct our LLMDA, differentiating patches based on security relevance. Following this, we apply a PTFormer to merge patches with code, formulating hybrid attributes that encompass both the innate details and the interconnections between the patches and the code. This distinctive combination method allows our system to capture more insights from the combined context of patches and code, hence improving detection precision. Finally, we devise a probabilistic batch contrastive learning mechanism within batches to augment the capability of the our LLMDA in discerning security patches. The results reveal that LLMDA significantly surpasses the start of the art techniques in detecting security patches, underscoring its promise in fortifying software maintenance.

KW - Computer Science - Cryptography and Security

KW - Computer Science - Artificial Intelligence

M3 - Preprint

BT - Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation

ER -

Research

Electronic data

Links

Keywords