Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information
AU - Debeyan, Fahad
AU - Hall, Tracy
AU - Bowes, David
N1 - Conference code: 22
PY - 2022/11/17
Y1 - 2022/11/17
N2 - The recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (VPMs) are a promising tool for vulnerability detection. Recent studies have focused on improving the performance of models to predict whether a piece of code is vulnerable or not (binary classification). However, such approaches are limited because they do not provide developers with information on the type of vulnerability that needs to be patched. We present our multiclass classification approach to improve the performance of vulnerability prediction models. Our approach uses abstract syntax tree n-grams to identify code clusters related to specific vulnerabilities. We evaluated our approach using real-world Java software vulnerability data. We report increased predictive performance compared to a variety of other models, for example, F-measure increases from 55% to 75% and MCC increases from 48% to 74%. Our results suggest that clustering software vulnerabilities using AST n-gram information is a promising approach to improve vulnerability prediction and enable specific information about the vulnerability type to be provided.
AB - The recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (VPMs) are a promising tool for vulnerability detection. Recent studies have focused on improving the performance of models to predict whether a piece of code is vulnerable or not (binary classification). However, such approaches are limited because they do not provide developers with information on the type of vulnerability that needs to be patched. We present our multiclass classification approach to improve the performance of vulnerability prediction models. Our approach uses abstract syntax tree n-grams to identify code clusters related to specific vulnerabilities. We evaluated our approach using real-world Java software vulnerability data. We report increased predictive performance compared to a variety of other models, for example, F-measure increases from 55% to 75% and MCC increases from 48% to 74%. Our results suggest that clustering software vulnerabilities using AST n-gram information is a promising approach to improve vulnerability prediction and enable specific information about the vulnerability type to be provided.
KW - Software Security
KW - Software Vulnerability
KW - Machine Learning
U2 - 10.1145/3558489.3559066
DO - 10.1145/3558489.3559066
M3 - Conference contribution/Paper
SP - 2
EP - 11
BT - PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering
A2 - McIntosh, Shane
A2 - Shang, Weiyi
A2 - Perez, Gema Rodriguez
PB - ACM
CY - New York
T2 - 18th International Conference on Predictive Models and Data Analytics in Software Engineering
Y2 - 17 November 2022
ER -