Divisive clustering of high dimensional data streams

Associated organisational units

Electronic data

Divisive_Clustering_of_High_Dimensional_Data_Streams
Rights statement: The final publication is available at Springer via http://dx.doi.org/10.1007/s11222-015-9597-y
Accepted author manuscript, 643 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1007/s11222-015-9597-y
Final published version

Keywords

Clustering, Data stream, High dimensionality , Population drift, Modality testing

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Divisive clustering of high dimensional data streams. / Hofmeyr, David ; Pavlidis, Nicos ; Eckley, Idris.
In: Statistics and Computing, Vol. 26, No. 5, 09.2016, p. 1101–1120.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Hofmeyr, D , Pavlidis, N & Eckley, I 2016, 'Divisive clustering of high dimensional data streams', Statistics and Computing, vol. 26, no. 5, pp. 1101–1120. https://doi.org/10.1007/s11222-015-9597-y

APA

Hofmeyr, D., Pavlidis, N., & Eckley, I. (2016). Divisive clustering of high dimensional data streams. Statistics and Computing, 26(5), 1101–1120. https://doi.org/10.1007/s11222-015-9597-y

Vancouver

Hofmeyr D , Pavlidis N , Eckley I. Divisive clustering of high dimensional data streams. Statistics and Computing. 2016 Sept;26(5):1101–1120. Epub 2015 Jul 31. doi: 10.1007/s11222-015-9597-y

Author

Hofmeyr, David ; Pavlidis, Nicos ; Eckley, Idris. / Divisive clustering of high dimensional data streams. In: Statistics and Computing. 2016 ; Vol. 26, No. 5. pp. 1101–1120.

Bibtex

@article{132c424447bf45b0b71415dfdc50ae5a,

title = "Divisive clustering of high dimensional data streams",

abstract = "Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.",

keywords = "Clustering, Data stream, High dimensionality , Population drift, Modality testing",

author = "David Hofmeyr and Nicos Pavlidis and Idris Eckley",

note = "Publication is available at: http://link.springer.com/article/10.1007%2Fs11222-015-9597-y",

year = "2016",

month = sep,

doi = "10.1007/s11222-015-9597-y",

language = "English",

volume = "26",

pages = "1101–1120",

journal = "Statistics and Computing",

issn = "0960-3174",

publisher = "Springer Netherlands",

number = "5",

}

RIS

TY - JOUR

T1 - Divisive clustering of high dimensional data streams

AU - Hofmeyr, David

AU - Pavlidis, Nicos

AU - Eckley, Idris

N1 - Publication is available at: http://link.springer.com/article/10.1007%2Fs11222-015-9597-y

PY - 2016/9

Y1 - 2016/9

N2 - Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.

AB - Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.

KW - Clustering

KW - Data stream

KW - High dimensionality

KW - Population drift

KW - Modality testing

U2 - 10.1007/s11222-015-9597-y

DO - 10.1007/s11222-015-9597-y

M3 - Journal article

VL - 26

SP - 1101

EP - 1120

JO - Statistics and Computing

JF - Statistics and Computing

SN - 0960-3174

IS - 5

ER -

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords