Home > Research > Publications & Outputs > A new online clustering approach for data in ar...

Electronic data

  • CYBCONF2015_CODAS

    Rights statement: ©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Accepted author manuscript, 646 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

A new online clustering approach for data in arbitrary shaped clusters

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Publication date24/06/2015
Host publicationCybernetics (CYBCONF), 2015 IEEE 2nd International Conference on
PublisherIEEE
Pages228-233
Number of pages6
ISBN (print)9781479983209
<mark>Original language</mark>English
EventCYBCONF - Gdynia, Poland
Duration: 24/06/201526/06/2015

Conference

ConferenceCYBCONF
Country/TerritoryPoland
CityGdynia
Period24/06/1526/06/15

Conference

ConferenceCYBCONF
Country/TerritoryPoland
CityGdynia
Period24/06/1526/06/15

Abstract

In this paper we demonstrate a new density based clustering technique, CODAS, for online clustering of streaming data into arbitrary shaped clusters. CODAS is a two stage process using a simple local density to initiate micro-clusters which are then combined into clusters. Memory efficiency is gained by not storing or re-using any data. Computational efficiency is gained by using hyper-spherical micro-clusters to achieve a micro-cluster joining technique that is dimensionally independent for speed. The micro-clusters divide the data space in to sub-spaces with a core region and a non-core region. Core regions which intersect define the clusters. A threshold value is used to identify outlier micro-clusters separately from small clusters of unusual data. The cluster information is fully maintained on-line. In this paper we compare CODAS with ELM, DEC, Chameleon, DBScan and Denstream and demonstrate that CODAS achieves comparable results but in a fully on-line and dimensionally scale-able manner.

Bibliographic note

©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.