Home > Research > Publications & Outputs > Efficient analysis of data streams

Electronic data

  • 2017daviesphd

    Final published version, 20.3 MB, PDF document

    Available under license: CC BY-ND: Creative Commons Attribution-NoDerivatives 4.0 International License

Text available via DOI:

View graph of relations

Efficient analysis of data streams

Research output: ThesisDoctoral Thesis

Published

Standard

Efficient analysis of data streams. / Davies, Rhian.
Lancaster University, 2017. 132 p.

Research output: ThesisDoctoral Thesis

Harvard

APA

Davies, R. (2017). Efficient analysis of data streams. [Doctoral Thesis, Lancaster University]. Lancaster University. https://doi.org/10.17635/lancaster/thesis/137

Vancouver

Davies R. Efficient analysis of data streams. Lancaster University, 2017. 132 p. doi: 10.17635/lancaster/thesis/137

Author

Davies, Rhian. / Efficient analysis of data streams. Lancaster University, 2017. 132 p.

Bibtex

@phdthesis{4b00b64c48e84210850019936ad6a0bf,
title = "Efficient analysis of data streams",
abstract = "Data streams provide a challenging environment for statistical analysis. Data points canarrive at a high velocity and may need to be deleted once they have been observed. Due to these restrictions, standard techniques may not be applicable to the data streaming scenario. This leads to the need for data summaries to represent the data stream. This thesis explores how data summaries can be used to perform clustering and classification on data streams across a broad range of applications.Spectral clustering is one such technique which prior to this work has not been applicableto the data streaming setting due to the high computation involved. CluStream is an existing method which uses micro-clusters to summarise data streams. We present two algorithms which utilise these micro-cluster summaries to enable spectral clustering to be performed on data streams. The methods were tested on simulated data streams, as well as textured images and hand-written digits.Distributed acoustic sensing is used to monitor oil flow at various depths throughout anoil well. Vibrations are recorded at very high resolutions, up to 10000 observations a second at each depth. Unfortunately, corruption can occur in the signal and engineers need to know where corruption occurs. We develop a method which treats the multiple time series as a high-dimensional clustering problem and uses the cluster labels to identify changes within the signal.The final piece of work concerns identifying areas of activity within a video stream, inparticular CCTV footage. It is more efficient if this classification stage is performed on acompressed version of the video stream. In order to reconstruct areas of activity in theoriginal video a recovery algorithm is needed. We present a comparison of the performance of two recovery algorithms and identify an ideal range for the compression ratio.",
author = "Rhian Davies",
year = "2017",
doi = "10.17635/lancaster/thesis/137",
language = "English",
publisher = "Lancaster University",
school = "Lancaster University",

}

RIS

TY - BOOK

T1 - Efficient analysis of data streams

AU - Davies, Rhian

PY - 2017

Y1 - 2017

N2 - Data streams provide a challenging environment for statistical analysis. Data points canarrive at a high velocity and may need to be deleted once they have been observed. Due to these restrictions, standard techniques may not be applicable to the data streaming scenario. This leads to the need for data summaries to represent the data stream. This thesis explores how data summaries can be used to perform clustering and classification on data streams across a broad range of applications.Spectral clustering is one such technique which prior to this work has not been applicableto the data streaming setting due to the high computation involved. CluStream is an existing method which uses micro-clusters to summarise data streams. We present two algorithms which utilise these micro-cluster summaries to enable spectral clustering to be performed on data streams. The methods were tested on simulated data streams, as well as textured images and hand-written digits.Distributed acoustic sensing is used to monitor oil flow at various depths throughout anoil well. Vibrations are recorded at very high resolutions, up to 10000 observations a second at each depth. Unfortunately, corruption can occur in the signal and engineers need to know where corruption occurs. We develop a method which treats the multiple time series as a high-dimensional clustering problem and uses the cluster labels to identify changes within the signal.The final piece of work concerns identifying areas of activity within a video stream, inparticular CCTV footage. It is more efficient if this classification stage is performed on acompressed version of the video stream. In order to reconstruct areas of activity in theoriginal video a recovery algorithm is needed. We present a comparison of the performance of two recovery algorithms and identify an ideal range for the compression ratio.

AB - Data streams provide a challenging environment for statistical analysis. Data points canarrive at a high velocity and may need to be deleted once they have been observed. Due to these restrictions, standard techniques may not be applicable to the data streaming scenario. This leads to the need for data summaries to represent the data stream. This thesis explores how data summaries can be used to perform clustering and classification on data streams across a broad range of applications.Spectral clustering is one such technique which prior to this work has not been applicableto the data streaming setting due to the high computation involved. CluStream is an existing method which uses micro-clusters to summarise data streams. We present two algorithms which utilise these micro-cluster summaries to enable spectral clustering to be performed on data streams. The methods were tested on simulated data streams, as well as textured images and hand-written digits.Distributed acoustic sensing is used to monitor oil flow at various depths throughout anoil well. Vibrations are recorded at very high resolutions, up to 10000 observations a second at each depth. Unfortunately, corruption can occur in the signal and engineers need to know where corruption occurs. We develop a method which treats the multiple time series as a high-dimensional clustering problem and uses the cluster labels to identify changes within the signal.The final piece of work concerns identifying areas of activity within a video stream, inparticular CCTV footage. It is more efficient if this classification stage is performed on acompressed version of the video stream. In order to reconstruct areas of activity in theoriginal video a recovery algorithm is needed. We present a comparison of the performance of two recovery algorithms and identify an ideal range for the compression ratio.

U2 - 10.17635/lancaster/thesis/137

DO - 10.17635/lancaster/thesis/137

M3 - Doctoral Thesis

PB - Lancaster University

ER -