Home > Research > Publications & Outputs > The state of machine learning methodology in so...

Links

Text available via DOI:

View graph of relations

The state of machine learning methodology in software fault prediction

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

The state of machine learning methodology in software fault prediction. / Hall, T.; Bowes, D.
Proceedings 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012. IEEE, 2012. p. 308-313.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Hall, T & Bowes, D 2012, The state of machine learning methodology in software fault prediction. in Proceedings 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012. IEEE, pp. 308-313. https://doi.org/10.1109/ICMLA.2012.226

APA

Hall, T., & Bowes, D. (2012). The state of machine learning methodology in software fault prediction. In Proceedings 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012 (pp. 308-313). IEEE. https://doi.org/10.1109/ICMLA.2012.226

Vancouver

Hall T, Bowes D. The state of machine learning methodology in software fault prediction. In Proceedings 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012. IEEE. 2012. p. 308-313 doi: 10.1109/ICMLA.2012.226

Author

Hall, T. ; Bowes, D. / The state of machine learning methodology in software fault prediction. Proceedings 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012. IEEE, 2012. pp. 308-313

Bibtex

@inproceedings{e9cae4444b7b42acb3ca178f6c48fb27,
title = "The state of machine learning methodology in software fault prediction",
abstract = "The aim of this paper is to investigate the quality of methodology in software fault prediction studies using machine learning. Over two hundred studies of fault prediction have been published in the last 10 years. There is evidence to suggest that the quality of methodology used in some of these studies does not allow us to have confidence in the predictions reported by them. We evaluate the machine learning methodology used in 21 fault prediction studies. All of these studies use NASA data sets. We score each study from 1 to 10 in terms of the quality of their machine learning methodology (e.g. whether or not studies report randomising their cross validation folds). Only 10 out of the 21 studies scored 5 or more out of 10. Furthermore 1 study scored only 1 out of 10. When we plot these scores over time there is no evidence that the quality of machine learning methodology is better in recent studies. Our results suggest that there remains much to be done by both researchers and reviewers to improve the quality of machine learning methodology used in software fault prediction. We conclude that the results reported in some studies need to be treated with caution.",
author = "T. Hall and D. Bowes",
year = "2012",
doi = "10.1109/ICMLA.2012.226",
language = "English",
isbn = "9781467346511",
pages = "308--313",
booktitle = "Proceedings 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012",
publisher = "IEEE",

}

RIS

TY - GEN

T1 - The state of machine learning methodology in software fault prediction

AU - Hall, T.

AU - Bowes, D.

PY - 2012

Y1 - 2012

N2 - The aim of this paper is to investigate the quality of methodology in software fault prediction studies using machine learning. Over two hundred studies of fault prediction have been published in the last 10 years. There is evidence to suggest that the quality of methodology used in some of these studies does not allow us to have confidence in the predictions reported by them. We evaluate the machine learning methodology used in 21 fault prediction studies. All of these studies use NASA data sets. We score each study from 1 to 10 in terms of the quality of their machine learning methodology (e.g. whether or not studies report randomising their cross validation folds). Only 10 out of the 21 studies scored 5 or more out of 10. Furthermore 1 study scored only 1 out of 10. When we plot these scores over time there is no evidence that the quality of machine learning methodology is better in recent studies. Our results suggest that there remains much to be done by both researchers and reviewers to improve the quality of machine learning methodology used in software fault prediction. We conclude that the results reported in some studies need to be treated with caution.

AB - The aim of this paper is to investigate the quality of methodology in software fault prediction studies using machine learning. Over two hundred studies of fault prediction have been published in the last 10 years. There is evidence to suggest that the quality of methodology used in some of these studies does not allow us to have confidence in the predictions reported by them. We evaluate the machine learning methodology used in 21 fault prediction studies. All of these studies use NASA data sets. We score each study from 1 to 10 in terms of the quality of their machine learning methodology (e.g. whether or not studies report randomising their cross validation folds). Only 10 out of the 21 studies scored 5 or more out of 10. Furthermore 1 study scored only 1 out of 10. When we plot these scores over time there is no evidence that the quality of machine learning methodology is better in recent studies. Our results suggest that there remains much to be done by both researchers and reviewers to improve the quality of machine learning methodology used in software fault prediction. We conclude that the results reported in some studies need to be treated with caution.

U2 - 10.1109/ICMLA.2012.226

DO - 10.1109/ICMLA.2012.226

M3 - Conference contribution/Paper

SN - 9781467346511

SP - 308

EP - 313

BT - Proceedings 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012

PB - IEEE

ER -