Rights statement: The final publication is available at Springer via http://dx.doi.org/10.1007/s11222-015-9597-y
Accepted author manuscript, 643 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Divisive clustering of high dimensional data streams
AU - Hofmeyr, David
AU - Pavlidis, Nicos
AU - Eckley, Idris
N1 - Publication is available at: http://link.springer.com/article/10.1007%2Fs11222-015-9597-y
PY - 2016/9
Y1 - 2016/9
N2 - Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.
AB - Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.
KW - Clustering
KW - Data stream
KW - High dimensionality
KW - Population drift
KW - Modality testing
U2 - 10.1007/s11222-015-9597-y
DO - 10.1007/s11222-015-9597-y
M3 - Journal article
VL - 26
SP - 1101
EP - 1120
JO - Statistics and Computing
JF - Statistics and Computing
SN - 0960-3174
IS - 5
ER -