Home > Research > Publications & Outputs > Improving the Performance of Code Vulnerability...

Links

Text available via DOI:

View graph of relations

Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Publication date17/11/2022
Host publicationPROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering
EditorsShane McIntosh, Weiyi Shang, Gema Rodriguez Perez
Place of PublicationNew York
PublisherACM
Pages2-11
Number of pages10
ISBN (electronic)9781450398602
<mark>Original language</mark>English
Event18th International Conference on Predictive Models and Data Analytics in Software Engineering - Singapore, Singapore
Duration: 17/11/2022 → …
Conference number: 22

Conference

Conference18th International Conference on Predictive Models and Data Analytics in Software Engineering
Abbreviated titlePROMISE
Country/TerritorySingapore
CitySingapore
Period17/11/22 → …

Conference

Conference18th International Conference on Predictive Models and Data Analytics in Software Engineering
Abbreviated titlePROMISE
Country/TerritorySingapore
CitySingapore
Period17/11/22 → …

Abstract

The recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (VPMs) are a promising tool for vulnerability detection. Recent studies have focused on improving the performance of models to predict whether a piece of code is vulnerable or not (binary classification). However, such approaches are limited because they do not provide developers with information on the type of vulnerability that needs to be patched. We present our multiclass classification approach to improve the performance of vulnerability prediction models. Our approach uses abstract syntax tree n-grams to identify code clusters related to specific vulnerabilities. We evaluated our approach using real-world Java software vulnerability data. We report increased predictive performance compared to a variety of other models, for example, F-measure increases from 55% to 75% and MCC increases from 48% to 74%. Our results suggest that clustering software vulnerabilities using AST n-gram information is a promising approach to improve vulnerability prediction and enable specific information about the vulnerability type to be provided.