Home > Research > Publications & Outputs > Reflections on the NASA MDP data sets

Links

Text available via DOI:

View graph of relations

Reflections on the NASA MDP data sets

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Reflections on the NASA MDP data sets. / Gray, D.; Bowes, D.; Davey, N. et al.
In: IET Software, Vol. 6, No. 6, 2012, p. 549 - 558.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Gray, D, Bowes, D, Davey, N, Sun, Y & Christianson, B 2012, 'Reflections on the NASA MDP data sets', IET Software, vol. 6, no. 6, pp. 549 - 558. https://doi.org/10.1049/iet-sen.2011.0132

APA

Gray, D., Bowes, D., Davey, N., Sun, Y., & Christianson, B. (2012). Reflections on the NASA MDP data sets. IET Software, 6(6), 549 - 558. https://doi.org/10.1049/iet-sen.2011.0132

Vancouver

Gray D, Bowes D, Davey N, Sun Y, Christianson B. Reflections on the NASA MDP data sets. IET Software. 2012;6(6):549 - 558. doi: 10.1049/iet-sen.2011.0132

Author

Gray, D. ; Bowes, D. ; Davey, N. et al. / Reflections on the NASA MDP data sets. In: IET Software. 2012 ; Vol. 6, No. 6. pp. 549 - 558.

Bibtex

@article{9638f8935fbd41dfa4341e43c6c4a309,
title = "Reflections on the NASA MDP data sets",
abstract = "Background: The NASA metrics data program (MDP) data sets have been heavily used in software defect prediction research. Aim: To highlight the data quality issues present in these data sets, and the problems that can arise when they are used in a binary classification context. Method: A thorough exploration of all 13 original NASA data sets, followed by various experiments demonstrating the potential impact of duplicate data points when data mining. Conclusions: Firstly researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Secondly, the bulk of defect prediction experiments based on the NASA MDP data sets may have led to erroneous findings. This is mainly because of repeated/duplicate data points potentially causing substantial amounts of training and testing data to be identical.",
author = "D. Gray and D. Bowes and N. Davey and Y. Sun and B. Christianson",
year = "2012",
doi = "10.1049/iet-sen.2011.0132",
language = "English",
volume = "6",
pages = "549 -- 558",
journal = "IET Software",
issn = "1751-8806",
publisher = "Institution of Engineering and Technology",
number = "6",

}

RIS

TY - JOUR

T1 - Reflections on the NASA MDP data sets

AU - Gray, D.

AU - Bowes, D.

AU - Davey, N.

AU - Sun, Y.

AU - Christianson, B.

PY - 2012

Y1 - 2012

N2 - Background: The NASA metrics data program (MDP) data sets have been heavily used in software defect prediction research. Aim: To highlight the data quality issues present in these data sets, and the problems that can arise when they are used in a binary classification context. Method: A thorough exploration of all 13 original NASA data sets, followed by various experiments demonstrating the potential impact of duplicate data points when data mining. Conclusions: Firstly researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Secondly, the bulk of defect prediction experiments based on the NASA MDP data sets may have led to erroneous findings. This is mainly because of repeated/duplicate data points potentially causing substantial amounts of training and testing data to be identical.

AB - Background: The NASA metrics data program (MDP) data sets have been heavily used in software defect prediction research. Aim: To highlight the data quality issues present in these data sets, and the problems that can arise when they are used in a binary classification context. Method: A thorough exploration of all 13 original NASA data sets, followed by various experiments demonstrating the potential impact of duplicate data points when data mining. Conclusions: Firstly researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Secondly, the bulk of defect prediction experiments based on the NASA MDP data sets may have led to erroneous findings. This is mainly because of repeated/duplicate data points potentially causing substantial amounts of training and testing data to be identical.

U2 - 10.1049/iet-sen.2011.0132

DO - 10.1049/iet-sen.2011.0132

M3 - Journal article

VL - 6

SP - 549

EP - 558

JO - IET Software

JF - IET Software

SN - 1751-8806

IS - 6

ER -