Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information

Computing and Communications

Text available via DOI:

https://doi.org/10.1145/3558489.3559066
Final published version

Keywords

Software Security, Software Vulnerability, Machine Learning

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information. / Debeyan, Fahad ; Hall, Tracy ; Bowes, David.
PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering. ed. / Shane McIntosh; Weiyi Shang; Gema Rodriguez Perez. New York: ACM, 2022. p. 2-11.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Debeyan, F , Hall, T & Bowes, D 2022, Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information. in S McIntosh, W Shang & GR Perez (eds), PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering. ACM, New York, pp. 2-11, 18th International Conference on Predictive Models and Data Analytics in Software Engineering, Singapore, Singapore, 17/11/22. https://doi.org/10.1145/3558489.3559066

APA

Debeyan, F., Hall, T., & Bowes, D. (2022). Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information. In S. McIntosh, W. Shang, & G. R. Perez (Eds.), PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering (pp. 2-11). ACM. https://doi.org/10.1145/3558489.3559066

Vancouver

Debeyan F , Hall T , Bowes D. Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information. In McIntosh S, Shang W, Perez GR, editors, PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering. New York: ACM. 2022. p. 2-11 Epub 2022 Nov 9. doi: 10.1145/3558489.3559066

Author

Debeyan, Fahad ; Hall, Tracy ; Bowes, David. / Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information. PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering. editor / Shane McIntosh ; Weiyi Shang ; Gema Rodriguez Perez. New York : ACM, 2022. pp. 2-11

Bibtex

@inproceedings{9ed27960785241ecafdb4a76c61cb128,

title = "Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information",

abstract = "The recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (VPMs) are a promising tool for vulnerability detection. Recent studies have focused on improving the performance of models to predict whether a piece of code is vulnerable or not (binary classification). However, such approaches are limited because they do not provide developers with information on the type of vulnerability that needs to be patched. We present our multiclass classification approach to improve the performance of vulnerability prediction models. Our approach uses abstract syntax tree n-grams to identify code clusters related to specific vulnerabilities. We evaluated our approach using real-world Java software vulnerability data. We report increased predictive performance compared to a variety of other models, for example, F-measure increases from 55% to 75% and MCC increases from 48% to 74%. Our results suggest that clustering software vulnerabilities using AST n-gram information is a promising approach to improve vulnerability prediction and enable specific information about the vulnerability type to be provided.",

keywords = "Software Security, Software Vulnerability, Machine Learning",

author = "Fahad Debeyan and Tracy Hall and David Bowes",

year = "2022",

month = nov,

day = "17",

doi = "10.1145/3558489.3559066",

language = "English",

pages = "2--11",

editor = "Shane McIntosh and Weiyi Shang and Perez, {Gema Rodriguez}",

booktitle = "PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering",

publisher = "ACM",

note = "18th International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE ; Conference date: 17-11-2022",

}

RIS

TY - GEN

T1 - Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information

AU - Debeyan, Fahad

AU - Hall, Tracy

AU - Bowes, David

N1 - Conference code: 22

PY - 2022/11/17

Y1 - 2022/11/17

N2 - The recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (VPMs) are a promising tool for vulnerability detection. Recent studies have focused on improving the performance of models to predict whether a piece of code is vulnerable or not (binary classification). However, such approaches are limited because they do not provide developers with information on the type of vulnerability that needs to be patched. We present our multiclass classification approach to improve the performance of vulnerability prediction models. Our approach uses abstract syntax tree n-grams to identify code clusters related to specific vulnerabilities. We evaluated our approach using real-world Java software vulnerability data. We report increased predictive performance compared to a variety of other models, for example, F-measure increases from 55% to 75% and MCC increases from 48% to 74%. Our results suggest that clustering software vulnerabilities using AST n-gram information is a promising approach to improve vulnerability prediction and enable specific information about the vulnerability type to be provided.

AB - The recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (VPMs) are a promising tool for vulnerability detection. Recent studies have focused on improving the performance of models to predict whether a piece of code is vulnerable or not (binary classification). However, such approaches are limited because they do not provide developers with information on the type of vulnerability that needs to be patched. We present our multiclass classification approach to improve the performance of vulnerability prediction models. Our approach uses abstract syntax tree n-grams to identify code clusters related to specific vulnerabilities. We evaluated our approach using real-world Java software vulnerability data. We report increased predictive performance compared to a variety of other models, for example, F-measure increases from 55% to 75% and MCC increases from 48% to 74%. Our results suggest that clustering software vulnerabilities using AST n-gram information is a promising approach to improve vulnerability prediction and enable specific information about the vulnerability type to be provided.

KW - Software Security

KW - Software Vulnerability

KW - Machine Learning

U2 - 10.1145/3558489.3559066

DO - 10.1145/3558489.3559066

M3 - Conference contribution/Paper

SP - 2

EP - 11

BT - PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering

A2 - McIntosh, Shane

A2 - Shang, Weiyi

A2 - Perez, Gema Rodriguez

PB - ACM

CY - New York

T2 - 18th International Conference on Predictive Models and Data Analytics in Software Engineering

Y2 - 17 November 2022

ER -

Research

Links

Text available via DOI:

Keywords