Home > Research > Publications & Outputs > Detecting abrupt changes in big data

Electronic data

  • 2017haynesphd

    Final published version, 1.84 MB, PDF document

    Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

View graph of relations

Detecting abrupt changes in big data

Research output: ThesisDoctoral Thesis

Published
Publication date2017
Number of pages186
QualificationPhD
Awarding Institution
Supervisors/Advisors
Publisher
  • Lancaster University
<mark>Original language</mark>English

Abstract

This thesis looks at developing methods for changepoint detection that can be used in the realm of Big Data. In particular we look at developing methods that can be scaled to the volume of data, now readily collected and stored, and are also versatile to the different varieties of data.

A well established approach to detect changes uses penalised optimisation where the choice of the penalty has a huge impact on the performance of the method. In the first part of this thesis we propose an algorithm, CROPS (Changepoints over a Range of PenaltieS), which finds the optimal solutions for a range of penalties instead of only specifying one penalty.

The second part of this thesis looks at the choice of cost function used in the optimisation. In particular we develop a computationally efficient method, which uses a nonparametric cost function, allowing for changes to be detected in a larger variety of data-sets. This nonparametric approach uses the empirical cumulative distribution of the data and thus does not require any assumptions to be made on distributional parameters.

The third part of this thesis looks at ways to parallelise detection methods in order to use multi-core computers and thus allowing for changes to be detected in much larger data-sets than they could be previously. We look at different ways to split the data across multiple cores and then merge the results to try to conserve as much of the accuracy that we had when we only used one core.