A new online clustering approach for data in arbitrary shaped clusters

Computing and Communications

Associated organisational units

Electronic data

CYBCONF2015_CODAS
Rights statement: ©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Accepted author manuscript, 646 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1109/CYBConf.2015.7175937
Final published version

Keywords

clustering, CODAS, online, data streams, big data, arbitrary shape, micro-cluster

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

More...

Publication date	24/06/2015
Host publication	Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on
Publisher	IEEE
Pages	228-233
Number of pages	6
ISBN (print)	9781479983209
<mark>Original language</mark>	English
Event	CYBCONF - Gdynia, Poland Duration: 24/06/2015 → 26/06/2015

Conference

Conference	CYBCONF
Country/Territory	Poland
City	Gdynia
Period	24/06/15 → 26/06/15

Conference

Conference	CYBCONF
Country/Territory	Poland
City	Gdynia
Period	24/06/15 → 26/06/15

Abstract

In this paper we demonstrate a new density based clustering technique, CODAS, for online clustering of streaming data into arbitrary shaped clusters. CODAS is a two stage process using a simple local density to initiate micro-clusters which are then combined into clusters. Memory efficiency is gained by not storing or re-using any data. Computational efficiency is gained by using hyper-spherical micro-clusters to achieve a micro-cluster joining technique that is dimensionally independent for speed. The micro-clusters divide the data space in to sub-spaces with a core region and a non-core region. Core regions which intersect define the clusters. A threshold value is used to identify outlier micro-clusters separately from small clusters of unusual data. The cluster information is fully maintained on-line. In this paper we compare CODAS with ELM, DEC, Chameleon, DBScan and Denstream and demonstrate that CODAS achieves comparable results but in a fully on-line and dimensionally scale-able manner.

Bibliographic note

©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords