The misuse of the NASA Metrics Data Program data sets for automated software defect prediction

Links

https://ieeexplore.ieee.org/document/6083167?arnumber=6083167
Final published version

Text available via DOI:

https://doi.org/10.1049/ic.2011.0012
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

The misuse of the NASA Metrics Data Program data sets for automated software defect prediction. / Gray, D.; Bowes, D.; Davey, N. et al.
15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011). IEEE, 2011. p. 96-103.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Gray, D, Bowes, D, Davey, N, Sun, Y & Christianson, B 2011, The misuse of the NASA Metrics Data Program data sets for automated software defect prediction. in 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011). IEEE, pp. 96-103. https://doi.org/10.1049/ic.2011.0012

APA

Gray, D., Bowes, D., Davey, N., Sun, Y., & Christianson, B. (2011). The misuse of the NASA Metrics Data Program data sets for automated software defect prediction. In 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011) (pp. 96-103). IEEE. https://doi.org/10.1049/ic.2011.0012

Vancouver

Gray D, Bowes D, Davey N, Sun Y, Christianson B. The misuse of the NASA Metrics Data Program data sets for automated software defect prediction. In 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011). IEEE. 2011. p. 96-103 doi: 10.1049/ic.2011.0012

Author

Gray, D. ; Bowes, D. ; Davey, N. et al. / The misuse of the NASA Metrics Data Program data sets for automated software defect prediction. 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011). IEEE, 2011. pp. 96-103

Bibtex

@inproceedings{976bccab8ae1441587475c39c7ecd8fc,

title = "The misuse of the NASA Metrics Data Program data sets for automated software defect prediction",

abstract = "Background: The NASA Metrics Data Program data sets have been heavily used in software defect prediction experiments. Aim: To demonstrate and explain why these data sets require significant pre-processing in order to be suitable for defect prediction. Method: A meticulously documented data cleansing process involving all 13 of the original NASA data sets. Results: Post our novel data cleansing process; each of the data sets had between 6 to 90 percent less of their original number of recorded values. Conclusions: One: Researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Two: Defect prediction data sets could benefit from lower level code metrics in addition to those more commonly used, as these will help to distinguish modules, reducing the likelihood of repeated data points. Three: The bulk of defect prediction experiments based on the NASA Metrics Data Program data sets may have led to erroneous findings. This is mainly due to repeated data points potentially causing substantial amounts of training and testing data to be identical.",

author = "D. Gray and D. Bowes and N. Davey and Y. Sun and B. Christianson",

year = "2011",

doi = "10.1049/ic.2011.0012",

language = "English",

pages = "96--103",

booktitle = "15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011)",

publisher = "IEEE",

}

RIS

TY - GEN

T1 - The misuse of the NASA Metrics Data Program data sets for automated software defect prediction

AU - Gray, D.

AU - Bowes, D.

AU - Davey, N.

AU - Sun, Y.

AU - Christianson, B.

PY - 2011

Y1 - 2011

N2 - Background: The NASA Metrics Data Program data sets have been heavily used in software defect prediction experiments. Aim: To demonstrate and explain why these data sets require significant pre-processing in order to be suitable for defect prediction. Method: A meticulously documented data cleansing process involving all 13 of the original NASA data sets. Results: Post our novel data cleansing process; each of the data sets had between 6 to 90 percent less of their original number of recorded values. Conclusions: One: Researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Two: Defect prediction data sets could benefit from lower level code metrics in addition to those more commonly used, as these will help to distinguish modules, reducing the likelihood of repeated data points. Three: The bulk of defect prediction experiments based on the NASA Metrics Data Program data sets may have led to erroneous findings. This is mainly due to repeated data points potentially causing substantial amounts of training and testing data to be identical.

AB - Background: The NASA Metrics Data Program data sets have been heavily used in software defect prediction experiments. Aim: To demonstrate and explain why these data sets require significant pre-processing in order to be suitable for defect prediction. Method: A meticulously documented data cleansing process involving all 13 of the original NASA data sets. Results: Post our novel data cleansing process; each of the data sets had between 6 to 90 percent less of their original number of recorded values. Conclusions: One: Researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Two: Defect prediction data sets could benefit from lower level code metrics in addition to those more commonly used, as these will help to distinguish modules, reducing the likelihood of repeated data points. Three: The bulk of defect prediction experiments based on the NASA Metrics Data Program data sets may have led to erroneous findings. This is mainly due to repeated data points potentially causing substantial amounts of training and testing data to be identical.

U2 - 10.1049/ic.2011.0012

DO - 10.1049/ic.2011.0012

M3 - Conference contribution/Paper

SP - 96

EP - 103

BT - 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011)

PB - IEEE

ER -

Research

Links

Text available via DOI: