Parallel computing TEDA for high frequency streaming data clustering

Computing and Communications

Associated organisational units

Electronic data

BigData16
Rights statement: The final publication is available at Springer via http://dx.doi.org/[insert DOI]
Accepted author manuscript, 4.77 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1007/978-3-319-47898-2_25
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Xiaowei Gu
Plamen Parvanov Angelov
German Gutierrez
Jose Antonio Iglesias
Araceli Sanchi

More...

Publication date	23/10/2016
Host publication	Advances in Big Data: Proceedings of the 2nd INNS Conference on Big Data, October 23-25, 2016, Thessaloniki, Greece
Editors	Plamen Angelov, Yannis Manolopoulos, Lazaros Iliadis, Asim Roy, Marley Vellasco
Place of Publication	Cham
Publisher	Springer
Pages	238-253
Number of pages	16
ISBN (electronic)	9783319478982
ISBN (print)	9783319478975
<mark>Original language</mark>	English
Event	2nd International Neural Network Society Conference on Big Data, INNS 2016 - Thessaloniki, Greece Duration: 23/10/2016 → 25/10/2016

Conference

Conference	2nd International Neural Network Society Conference on Big Data, INNS 2016
Country/Territory	Greece
City	Thessaloniki
Period	23/10/16 → 25/10/16

Conference

Conference	2nd International Neural Network Society Conference on Big Data, INNS 2016
Country/Territory	Greece
City	Thessaloniki
Period	23/10/16 → 25/10/16

Abstract

In this paper, a novel online clustering approach called Parallel_TEDA is introduced for processing high frequency streaming data. This newly proposed approach is developed within the recently introduced TEDA theory and inherits all advantages from it. In the proposed approach, a number of data stream processors are involved, which collaborate with each other efficiently to achieve parallel computation as well as a much higher processing speed. A fusion center is involved to gather the key information from the processors which work on chunks of the whole data stream and generate the overall output. The quality of the generated clusters is being monitored within the data processors all the time and stale clusters are being removed to ensure the correctness and timeliness of the overall clustering results. This, in turn, gives the proposed approach a stronger ability of handling shifts/drifts that may take place in live data streams. The numerical experiments performed with the proposed new approach Parallel_TEDA on benchmark datasets present higher performance and faster processing speed when compared with the alternative well-known approaches. The processing speed has been demonstrated to fall exponentially with more data processors involved. This new online clustering approach is very suitable and promising for real-time high frequency streaming processing and data analytics.

Research

Associated organisational units

Electronic data

Links

Text available via DOI: