Home > Research > Publications & Outputs > The Jinx on the NASA software defect data sets

Electronic data

  • nasa_paper

    Rights statement: © 2016 ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering http://dx.doi.org/10.1145/2915970.2916007

    Accepted author manuscript, 183 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

The Jinx on the NASA software defect data sets

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

The Jinx on the NASA software defect data sets. / Petrić, Jean; Bowes, David; Hall, Tracy et al.
EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering. New York: Association for Computing Machinery, Inc, 2016. 13.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Petrić, J, Bowes, D, Hall, T, Christianson, B & Baddoo, N 2016, The Jinx on the NASA software defect data sets. in EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering., 13, Association for Computing Machinery, Inc, New York, 20th International Conference on Evaluation and Assessment in Software Engineering, EASE 2016, Limerick, Ireland, 1/06/16. https://doi.org/10.1145/2915970.2916007

APA

Petrić, J., Bowes, D., Hall, T., Christianson, B., & Baddoo, N. (2016). The Jinx on the NASA software defect data sets. In EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering Article 13 Association for Computing Machinery, Inc. https://doi.org/10.1145/2915970.2916007

Vancouver

Petrić J, Bowes D, Hall T, Christianson B, Baddoo N. The Jinx on the NASA software defect data sets. In EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering. New York: Association for Computing Machinery, Inc. 2016. 13 doi: 10.1145/2915970.2916007

Author

Petrić, Jean ; Bowes, David ; Hall, Tracy et al. / The Jinx on the NASA software defect data sets. EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering. New York : Association for Computing Machinery, Inc, 2016.

Bibtex

@inproceedings{b7b1bd0a76284afcb74be0ef74a18b6d,
title = "The Jinx on the NASA software defect data sets",
abstract = "Background: The NASA datasets have previously been used extensively in studies of software defects. In 2013 Shepperd et al. presented an essential set of rules for removing erroneous data from the NASA datasets making this data more reliable to use. Objective: We have now found additional rules necessary for removing problematic data which were not identified by Shepperd et al. Results: In this paper, we demonstrate the level of erroneous data still present even after cleaning using Shepperd et al.'s rules and apply our new rules to remove this erroneous data. Conclusion: Even after systematic data cleaning of the NASA MDP datasets, we found new erroneous data. Data quality should always be explicitly considered by researchers before use.",
keywords = "Data quality, Machine learning, Software defect prediction",
author = "Jean Petri{\'c} and David Bowes and Tracy Hall and Bruce Christianson and Nathan Baddoo",
note = "{\textcopyright} 2016 ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering http://dx.doi.org/10.1145/2915970.2916007; 20th International Conference on Evaluation and Assessment in Software Engineering, EASE 2016 ; Conference date: 01-06-2016 Through 03-06-2016",
year = "2016",
month = jun,
day = "1",
doi = "10.1145/2915970.2916007",
language = "English",
booktitle = "EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering",
publisher = "Association for Computing Machinery, Inc",

}

RIS

TY - GEN

T1 - The Jinx on the NASA software defect data sets

AU - Petrić, Jean

AU - Bowes, David

AU - Hall, Tracy

AU - Christianson, Bruce

AU - Baddoo, Nathan

N1 - © 2016 ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering http://dx.doi.org/10.1145/2915970.2916007

PY - 2016/6/1

Y1 - 2016/6/1

N2 - Background: The NASA datasets have previously been used extensively in studies of software defects. In 2013 Shepperd et al. presented an essential set of rules for removing erroneous data from the NASA datasets making this data more reliable to use. Objective: We have now found additional rules necessary for removing problematic data which were not identified by Shepperd et al. Results: In this paper, we demonstrate the level of erroneous data still present even after cleaning using Shepperd et al.'s rules and apply our new rules to remove this erroneous data. Conclusion: Even after systematic data cleaning of the NASA MDP datasets, we found new erroneous data. Data quality should always be explicitly considered by researchers before use.

AB - Background: The NASA datasets have previously been used extensively in studies of software defects. In 2013 Shepperd et al. presented an essential set of rules for removing erroneous data from the NASA datasets making this data more reliable to use. Objective: We have now found additional rules necessary for removing problematic data which were not identified by Shepperd et al. Results: In this paper, we demonstrate the level of erroneous data still present even after cleaning using Shepperd et al.'s rules and apply our new rules to remove this erroneous data. Conclusion: Even after systematic data cleaning of the NASA MDP datasets, we found new erroneous data. Data quality should always be explicitly considered by researchers before use.

KW - Data quality

KW - Machine learning

KW - Software defect prediction

U2 - 10.1145/2915970.2916007

DO - 10.1145/2915970.2916007

M3 - Conference contribution/Paper

AN - SCOPUS:84978484033

BT - EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering

PB - Association for Computing Machinery, Inc

CY - New York

T2 - 20th International Conference on Evaluation and Assessment in Software Engineering, EASE 2016

Y2 - 1 June 2016 through 3 June 2016

ER -