Home > Research > Publications & Outputs > So You Need More Method Level Datasets for Your...

Electronic data

  • ESEM2016_paper_196

    Rights statement: © ACM, 2016. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement http://dx.doi.org/10.1145/2961111.2962620

    Accepted author manuscript, 200 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

So You Need More Method Level Datasets for Your Software Defect Prediction?: Voilà!

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Close
Publication date8/09/2016
Host publicationESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
Place of PublicationNew York
PublisherIEEE Computer Society
Number of pages6
ISBN (electronic)9781450344272
<mark>Original language</mark>English
Event10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016 - Ciudad Real, Spain
Duration: 8/09/20169/09/2016

Conference

Conference10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016
Country/TerritorySpain
CityCiudad Real
Period8/09/169/09/16

Conference

Conference10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016
Country/TerritorySpain
CityCiudad Real
Period8/09/169/09/16

Abstract

Context: Defect prediction research is based on a small number of defect datasets and most are at class not method level. Consequently our knowledge of defects is limited. Identifying defect datasets for prediction is not easy and extracting quality data from identified datasets is even more difficult. Goal: Identify open source Java systems suitable for defect prediction and extract high quality fault data from these datasets. Method: We used the Boa to identify candidate open source systems. We reduce 50,000 potential candidates down to 23 suitable for defect prediction using a selection criteria based on the system's software repository and its defect tracking system. We use an enhanced SZZ algorithm to extract fault information and calculate metrics using JHawk. Result: We have produced 138 fault and metrics datasets for the 23 identified systems. We make these datasets (the ELFF datasets) and our data extraction tools freely available to future researchers. Conclusions: The data we provide enables future studies to proceed with minimal effort. Our datasets significantly increase the pool of systems currently being used in defect analysis studies.

Bibliographic note

© ACM, 2016. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ESEM '16 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement http://dx.doi.org/10.1145/2961111.2962620