So You Need More Method Level Datasets for Your Software Defect Prediction? - Research Portal

Associated organisational units

Electronic data

ESEM2016_paper_196
Rights statement: © ACM, 2016. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement http://dx.doi.org/10.1145/2961111.2962620
Accepted author manuscript, 200 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1145/2961111.2962620
Final published version

Keywords

Boa, Data Mining, Defect linking, Defect Prediction, Defects

View graph of relations

So You Need More Method Level Datasets for Your Software Defect Prediction?: Voilà!

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

So You Need More Method Level Datasets for Your Software Defect Prediction? Voilà! / Shippey, Thomas; Hall, Tracy; Counsell, Steve et al.
ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. New York: IEEE Computer Society, 2016. 12.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Shippey, T, Hall, T, Counsell, S & Bowes, D 2016, So You Need More Method Level Datasets for Your Software Defect Prediction? Voilà! in ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement., 12, IEEE Computer Society, New York, 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016, Ciudad Real, Spain, 8/09/16. https://doi.org/10.1145/2961111.2962620

APA

Shippey, T., Hall, T., Counsell, S., & Bowes, D. (2016). So You Need More Method Level Datasets for Your Software Defect Prediction? Voilà! In ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement Article 12 IEEE Computer Society. https://doi.org/10.1145/2961111.2962620

Vancouver

Shippey T, Hall T, Counsell S, Bowes D. So You Need More Method Level Datasets for Your Software Defect Prediction? Voilà! In ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. New York: IEEE Computer Society. 2016. 12 doi: 10.1145/2961111.2962620

Author

Shippey, Thomas ; Hall, Tracy ; Counsell, Steve et al. / So You Need More Method Level Datasets for Your Software Defect Prediction? Voilà!. ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. New York : IEEE Computer Society, 2016.

Bibtex

@inproceedings{7c8e7aed6f2741a5ba12150978dbe1a1,

title = "So You Need More Method Level Datasets for Your Software Defect Prediction?: Voil{\`a}!",

abstract = "Context: Defect prediction research is based on a small number of defect datasets and most are at class not method level. Consequently our knowledge of defects is limited. Identifying defect datasets for prediction is not easy and extracting quality data from identified datasets is even more difficult. Goal: Identify open source Java systems suitable for defect prediction and extract high quality fault data from these datasets. Method: We used the Boa to identify candidate open source systems. We reduce 50,000 potential candidates down to 23 suitable for defect prediction using a selection criteria based on the system's software repository and its defect tracking system. We use an enhanced SZZ algorithm to extract fault information and calculate metrics using JHawk. Result: We have produced 138 fault and metrics datasets for the 23 identified systems. We make these datasets (the ELFF datasets) and our data extraction tools freely available to future researchers. Conclusions: The data we provide enables future studies to proceed with minimal effort. Our datasets significantly increase the pool of systems currently being used in defect analysis studies.",

keywords = "Boa, Data Mining, Defect linking, Defect Prediction, Defects",

author = "Thomas Shippey and Tracy Hall and Steve Counsell and David Bowes",

note = "{\textcopyright} ACM, 2016. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement http://dx.doi.org/10.1145/2961111.2962620; 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016 ; Conference date: 08-09-2016 Through 09-09-2016",

year = "2016",

month = sep,

day = "8",

doi = "10.1145/2961111.2962620",

language = "English",

booktitle = "ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement",

publisher = "IEEE Computer Society",

}

RIS

TY - GEN

T1 - So You Need More Method Level Datasets for Your Software Defect Prediction?

T2 - 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016

AU - Shippey, Thomas

AU - Hall, Tracy

AU - Counsell, Steve

AU - Bowes, David

N1 - © ACM, 2016. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement http://dx.doi.org/10.1145/2961111.2962620

PY - 2016/9/8

Y1 - 2016/9/8

N2 - Context: Defect prediction research is based on a small number of defect datasets and most are at class not method level. Consequently our knowledge of defects is limited. Identifying defect datasets for prediction is not easy and extracting quality data from identified datasets is even more difficult. Goal: Identify open source Java systems suitable for defect prediction and extract high quality fault data from these datasets. Method: We used the Boa to identify candidate open source systems. We reduce 50,000 potential candidates down to 23 suitable for defect prediction using a selection criteria based on the system's software repository and its defect tracking system. We use an enhanced SZZ algorithm to extract fault information and calculate metrics using JHawk. Result: We have produced 138 fault and metrics datasets for the 23 identified systems. We make these datasets (the ELFF datasets) and our data extraction tools freely available to future researchers. Conclusions: The data we provide enables future studies to proceed with minimal effort. Our datasets significantly increase the pool of systems currently being used in defect analysis studies.

AB - Context: Defect prediction research is based on a small number of defect datasets and most are at class not method level. Consequently our knowledge of defects is limited. Identifying defect datasets for prediction is not easy and extracting quality data from identified datasets is even more difficult. Goal: Identify open source Java systems suitable for defect prediction and extract high quality fault data from these datasets. Method: We used the Boa to identify candidate open source systems. We reduce 50,000 potential candidates down to 23 suitable for defect prediction using a selection criteria based on the system's software repository and its defect tracking system. We use an enhanced SZZ algorithm to extract fault information and calculate metrics using JHawk. Result: We have produced 138 fault and metrics datasets for the 23 identified systems. We make these datasets (the ELFF datasets) and our data extraction tools freely available to future researchers. Conclusions: The data we provide enables future studies to proceed with minimal effort. Our datasets significantly increase the pool of systems currently being used in defect analysis studies.

KW - Boa

KW - Data Mining

KW - Defect linking

KW - Defect Prediction

KW - Defects

U2 - 10.1145/2961111.2962620

DO - 10.1145/2961111.2962620

M3 - Conference contribution/Paper

AN - SCOPUS:84991583839

BT - ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

PB - IEEE Computer Society

CY - New York

Y2 - 8 September 2016 through 9 September 2016

ER -

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords

So You Need More Method Level Datasets for Your Software Defect Prediction?: Voilà!

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us