Home > Research > Publications & Outputs > The state of machine learning methodology in so...

Links

Text available via DOI:

View graph of relations

The state of machine learning methodology in software fault prediction

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Publication date2012
Host publicationProceedings 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012
PublisherIEEE
Pages308-313
Number of pages6
ISBN (print)9781467346511
<mark>Original language</mark>English

Abstract

The aim of this paper is to investigate the quality of methodology in software fault prediction studies using machine learning. Over two hundred studies of fault prediction have been published in the last 10 years. There is evidence to suggest that the quality of methodology used in some of these studies does not allow us to have confidence in the predictions reported by them. We evaluate the machine learning methodology used in 21 fault prediction studies. All of these studies use NASA data sets. We score each study from 1 to 10 in terms of the quality of their machine learning methodology (e.g. whether or not studies report randomising their cross validation folds). Only 10 out of the 21 studies scored 5 or more out of 10. Furthermore 1 study scored only 1 out of 10. When we plot these scores over time there is no evidence that the quality of machine learning methodology is better in recent studies. Our results suggest that there remains much to be done by both researchers and reviewers to improve the quality of machine learning methodology used in software fault prediction. We conclude that the results reported in some studies need to be treated with caution.