- khaleghi16a
**Rights statement:**c 2016 Azadeh Khaleghi, Daniil Ryabko, Jeremie Mary and Philippe Preux.Accepted author manuscript, 477 KB, PDF document

Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

- khaleghi16a
Final published version, 477 KB, PDF document

Available under license: CC BY: Creative Commons Attribution 4.0 International License

- http://jmlr.org/papers/v17/khaleghi16a.html
Final published version

Licence: CC BY: Creative Commons Attribution 4.0 International License

Research output: Contribution to journal › Journal article › peer-review

Published

**Consistent algorithms for clustering time series.** / Khaleghi, Azedeh; Ryabko, Daniil; Mari, Jeremie; Preux, Philippe.

Research output: Contribution to journal › Journal article › peer-review

Khaleghi, A, Ryabko, D, Mari, J & Preux, P 2016, 'Consistent algorithms for clustering time series', *Journal of Machine Learning Research*, vol. 17, no. 3, pp. 1-32. <http://jmlr.org/papers/v17/khaleghi16a.html>

Khaleghi, A., Ryabko, D., Mari, J., & Preux, P. (2016). Consistent algorithms for clustering time series. *Journal of Machine Learning Research*, *17*(3), 1-32. http://jmlr.org/papers/v17/khaleghi16a.html

Khaleghi A, Ryabko D, Mari J, Preux P. Consistent algorithms for clustering time series. Journal of Machine Learning Research. 2016 Mar;17(3):1-32.

@article{235912aa0d0c4b48bca431673c094e38,

title = "Consistent algorithms for clustering time series",

abstract = "The problem of clustering is considered for the case where every point is a time series. The time series are either given in one batch (offline setting), or they are allowed to grow with time and new time series can be added along the way (online setting). We propose a natural notion of consistency for this problem, and show that there are simple, computationally efficient algorithms that are asymptotically consistent under extremely weak assumptions on the distributions that generate the data. The notion of consistency is as follows. A clustering algorithm is called consistent if it places two time series into the same cluster if and only if the distribution that generates them is the same. In the considered framework the time series are allowed to be highly dependent, and the dependence can have arbitrary form. If the number of clusters is known, the only assumption we make is that the (marginal) distribution of each time series is stationary ergodic. No parametric, memory or mixing assumptions are made. When the number of clusters is unknown, stronger assumptions are provably necessary, but it is still possible to devise nonparametric algorithms that are consistent under very general conditions. The theoretical findings of this work are illustrated with experiments on both synthetic and real data.",

keywords = "clustering, time series, ergodicity, unsupervised learning",

author = "Azedeh Khaleghi and Daniil Ryabko and Jeremie Mari and Philippe Preux",

note = "c 2016 Azadeh Khaleghi, Daniil Ryabko, Jeremie Mary and Philippe Preux. ",

year = "2016",

month = mar,

language = "English",

volume = "17",

pages = "1--32",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Microtome Publishing",

number = "3",

}

TY - JOUR

T1 - Consistent algorithms for clustering time series

AU - Khaleghi, Azedeh

AU - Ryabko, Daniil

AU - Mari, Jeremie

AU - Preux, Philippe

N1 - c 2016 Azadeh Khaleghi, Daniil Ryabko, Jeremie Mary and Philippe Preux.

PY - 2016/3

Y1 - 2016/3

N2 - The problem of clustering is considered for the case where every point is a time series. The time series are either given in one batch (offline setting), or they are allowed to grow with time and new time series can be added along the way (online setting). We propose a natural notion of consistency for this problem, and show that there are simple, computationally efficient algorithms that are asymptotically consistent under extremely weak assumptions on the distributions that generate the data. The notion of consistency is as follows. A clustering algorithm is called consistent if it places two time series into the same cluster if and only if the distribution that generates them is the same. In the considered framework the time series are allowed to be highly dependent, and the dependence can have arbitrary form. If the number of clusters is known, the only assumption we make is that the (marginal) distribution of each time series is stationary ergodic. No parametric, memory or mixing assumptions are made. When the number of clusters is unknown, stronger assumptions are provably necessary, but it is still possible to devise nonparametric algorithms that are consistent under very general conditions. The theoretical findings of this work are illustrated with experiments on both synthetic and real data.

AB - The problem of clustering is considered for the case where every point is a time series. The time series are either given in one batch (offline setting), or they are allowed to grow with time and new time series can be added along the way (online setting). We propose a natural notion of consistency for this problem, and show that there are simple, computationally efficient algorithms that are asymptotically consistent under extremely weak assumptions on the distributions that generate the data. The notion of consistency is as follows. A clustering algorithm is called consistent if it places two time series into the same cluster if and only if the distribution that generates them is the same. In the considered framework the time series are allowed to be highly dependent, and the dependence can have arbitrary form. If the number of clusters is known, the only assumption we make is that the (marginal) distribution of each time series is stationary ergodic. No parametric, memory or mixing assumptions are made. When the number of clusters is unknown, stronger assumptions are provably necessary, but it is still possible to devise nonparametric algorithms that are consistent under very general conditions. The theoretical findings of this work are illustrated with experiments on both synthetic and real data.

KW - clustering

KW - time series

KW - ergodicity

KW - unsupervised learning

M3 - Journal article

VL - 17

SP - 1

EP - 32

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

IS - 3

ER -