Detecting abrupt changes in big data

Associated organisational unit

Statistical Artificial Intelligence

Electronic data

2017haynesphd
Final published version, 1.84 MB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/13
Final published version

View graph of relations

Research output: Thesis › Doctoral Thesis

Published

Standard

Detecting abrupt changes in big data. / Haynes, Kaylea.
Lancaster University, 2017. 186 p.

Research output: Thesis › Doctoral Thesis

Harvard

Haynes, K 2017, 'Detecting abrupt changes in big data', PhD, Lancaster University. https://doi.org/10.17635/lancaster/thesis/13

APA

Haynes, K. (2017). Detecting abrupt changes in big data. [Doctoral Thesis, Lancaster University]. Lancaster University. https://doi.org/10.17635/lancaster/thesis/13

Vancouver

Haynes K. Detecting abrupt changes in big data. Lancaster University, 2017. 186 p. doi: 10.17635/lancaster/thesis/13

Author

Haynes, Kaylea. / Detecting abrupt changes in big data. Lancaster University, 2017. 186 p.

Bibtex

@phdthesis{459ee1698be24d4986d18a7044f9b135,

title = "Detecting abrupt changes in big data",

abstract = "This thesis looks at developing methods for changepoint detection that can be used in the realm of Big Data. In particular we look at developing methods that can be scaled to the volume of data, now readily collected and stored, and are also versatile to the different varieties of data.A well established approach to detect changes uses penalised optimisation where the choice of the penalty has a huge impact on the performance of the method. In the first part of this thesis we propose an algorithm, CROPS (Changepoints over a Range of PenaltieS), which finds the optimal solutions for a range of penalties instead of only specifying one penalty.The second part of this thesis looks at the choice of cost function used in the optimisation. In particular we develop a computationally efficient method, which uses a nonparametric cost function, allowing for changes to be detected in a larger variety of data-sets. This nonparametric approach uses the empirical cumulative distribution of the data and thus does not require any assumptions to be made on distributional parameters.The third part of this thesis looks at ways to parallelise detection methods in order to use multi-core computers and thus allowing for changes to be detected in much larger data-sets than they could be previously. We look at different ways to split the data across multiple cores and then merge the results to try to conserve as much of the accuracy that we had when we only used one core.",

author = "Kaylea Haynes",

year = "2017",

doi = "10.17635/lancaster/thesis/13",

language = "English",

publisher = "Lancaster University",

school = "Lancaster University",

}

RIS

TY - BOOK

T1 - Detecting abrupt changes in big data

AU - Haynes, Kaylea

PY - 2017

Y1 - 2017

N2 - This thesis looks at developing methods for changepoint detection that can be used in the realm of Big Data. In particular we look at developing methods that can be scaled to the volume of data, now readily collected and stored, and are also versatile to the different varieties of data.A well established approach to detect changes uses penalised optimisation where the choice of the penalty has a huge impact on the performance of the method. In the first part of this thesis we propose an algorithm, CROPS (Changepoints over a Range of PenaltieS), which finds the optimal solutions for a range of penalties instead of only specifying one penalty.The second part of this thesis looks at the choice of cost function used in the optimisation. In particular we develop a computationally efficient method, which uses a nonparametric cost function, allowing for changes to be detected in a larger variety of data-sets. This nonparametric approach uses the empirical cumulative distribution of the data and thus does not require any assumptions to be made on distributional parameters.The third part of this thesis looks at ways to parallelise detection methods in order to use multi-core computers and thus allowing for changes to be detected in much larger data-sets than they could be previously. We look at different ways to split the data across multiple cores and then merge the results to try to conserve as much of the accuracy that we had when we only used one core.

AB - This thesis looks at developing methods for changepoint detection that can be used in the realm of Big Data. In particular we look at developing methods that can be scaled to the volume of data, now readily collected and stored, and are also versatile to the different varieties of data.A well established approach to detect changes uses penalised optimisation where the choice of the penalty has a huge impact on the performance of the method. In the first part of this thesis we propose an algorithm, CROPS (Changepoints over a Range of PenaltieS), which finds the optimal solutions for a range of penalties instead of only specifying one penalty.The second part of this thesis looks at the choice of cost function used in the optimisation. In particular we develop a computationally efficient method, which uses a nonparametric cost function, allowing for changes to be detected in a larger variety of data-sets. This nonparametric approach uses the empirical cumulative distribution of the data and thus does not require any assumptions to be made on distributional parameters.The third part of this thesis looks at ways to parallelise detection methods in order to use multi-core computers and thus allowing for changes to be detected in much larger data-sets than they could be previously. We look at different ways to split the data across multiple cores and then merge the results to try to conserve as much of the accuracy that we had when we only used one core.

U2 - 10.17635/lancaster/thesis/13

DO - 10.17635/lancaster/thesis/13

M3 - Doctoral Thesis

PB - Lancaster University

ER -

Research

Associated organisational unit

Electronic data

Text available via DOI: