Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information

Computing and Communications

Text available via DOI:

https://doi.org/10.1145/3558489.3559066
Final published version

Keywords

Software Security, Software Vulnerability, Machine Learning

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

More...

Publication date	17/11/2022
Host publication	PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering
Editors	Shane McIntosh, Weiyi Shang, Gema Rodriguez Perez
Place of Publication	New York
Publisher	ACM
Pages	2-11
Number of pages	10
ISBN (electronic)	9781450398602
<mark>Original language</mark>	English
Event	18th International Conference on Predictive Models and Data Analytics in Software Engineering - Singapore, Singapore Duration: 17/11/2022 → … Conference number: 22

Conference

Conference	18th International Conference on Predictive Models and Data Analytics in Software Engineering
Abbreviated title	PROMISE
Country/Territory	Singapore
City	Singapore
Period	17/11/22 → …

Conference

Conference	18th International Conference on Predictive Models and Data Analytics in Software Engineering
Abbreviated title	PROMISE
Country/Territory	Singapore
City	Singapore
Period	17/11/22 → …

Abstract

The recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (VPMs) are a promising tool for vulnerability detection. Recent studies have focused on improving the performance of models to predict whether a piece of code is vulnerable or not (binary classification). However, such approaches are limited because they do not provide developers with information on the type of vulnerability that needs to be patched. We present our multiclass classification approach to improve the performance of vulnerability prediction models. Our approach uses abstract syntax tree n-grams to identify code clusters related to specific vulnerabilities. We evaluated our approach using real-world Java software vulnerability data. We report increased predictive performance compared to a variety of other models, for example, F-measure increases from 55% to 75% and MCC increases from 48% to 74%. Our results suggest that clustering software vulnerabilities using AST n-gram information is a promising approach to improve vulnerability prediction and enable specific information about the vulnerability type to be provided.

Research

Links

Text available via DOI:

Keywords