What is the impact of imbalance on software defect prediction performance?

Associated organisational units

Text available via DOI:

https://doi.org/10.1145/2810146.2810150
Final published version

Keywords

Data Imbalance, Defect Prediction, Machine Learning

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Zaheed Mahmood
David Bowes
Peter C R Lane
Tracy Hall

More...

Publication date	21/10/2015
Host publication	PROMISE '15 Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering
Place of Publication	New York
Publisher	Association for Computing Machinery, Inc
ISBN (electronic)	9781450337151
<mark>Original language</mark>	English
Event	11th International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE 2015 - Beijing, China Duration: 21/10/2015 → …

Conference

Conference	11th International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE 2015
Country/Territory	China
City	Beijing
Period	21/10/15 → …

Conference

Conference	11th International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE 2015
Country/Territory	China
City	Beijing
Period	21/10/15 → …

Abstract

Software defect prediction performance varies over a large range. Menzies suggested there is a ceiling effect of 80% Recall [8]. Most of the data sets used are highly imbalanced. This paper asks, what is the empirical effect of using different datasets with varying levels of imbalance on predictive performance? We use data synthesised by a previous meta-analysis of 600 fault prediction models and their results. Four model evaluation measures (the Mathews Correlation Coeficient (MCC), F-Measure, Precision and Re- call ) are compared to the corresponding data imbalance ratio. When the data are imbalanced, the predictive performance of software defect prediction studies is low. As the data become more balanced, the predictive performance of prediction models increases, from an average MCC of 0.15, until the minority class makes up 20% of the instances in the dataset, where the MCC reaches an average value of about 0.34. As the proportion of the minority class increases above 20%, the predictive performance does not significantly increase. Using datasets with more than 20% of the instances being defective has not had a significant impact on the predictive performance when using MCC. We conclude that comparing the results of defect prediction studies should take into account the imbalance of the data.

Research

Associated organisational units

Links

Text available via DOI:

Keywords