Clustering of biological time series by cepstral coefficients based distances

Mathematics and Statistics

Text available via DOI:

https://doi.org/10.1016/j.patcog.2008.01.002
Final published version

Keywords

Exponential model, Likelihood, Distance measures, Spectral analysis, Periodogram, Data mining, Protein sequence analysis

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Clustering of biological time series by cepstral coefficients based distances. / Savvides, A.; Promponas, V.J.; Fokianos, K.
In: Pattern Recognition, Vol. 41, No. 7, 07.2008, p. 2398-2412.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Savvides, A, Promponas, VJ & Fokianos, K 2008, 'Clustering of biological time series by cepstral coefficients based distances', Pattern Recognition, vol. 41, no. 7, pp. 2398-2412. https://doi.org/10.1016/j.patcog.2008.01.002

APA

Savvides, A., Promponas, V. J., & Fokianos, K. (2008). Clustering of biological time series by cepstral coefficients based distances. Pattern Recognition, 41(7), 2398-2412. https://doi.org/10.1016/j.patcog.2008.01.002

Vancouver

Savvides A, Promponas VJ, Fokianos K. Clustering of biological time series by cepstral coefficients based distances. Pattern Recognition. 2008 Jul;41(7):2398-2412. Epub 2008 Jan 17. doi: 10.1016/j.patcog.2008.01.002

Author

Savvides, A. ; Promponas, V.J. ; Fokianos, K. / Clustering of biological time series by cepstral coefficients based distances. In: Pattern Recognition. 2008 ; Vol. 41, No. 7. pp. 2398-2412.

Bibtex

@article{b430ee26fdbf42ecb95489e49b6cba47,

title = "Clustering of biological time series by cepstral coefficients based distances",

abstract = "Clustering of stationary time series has become an important tool in many scientific applications, like medicine, finance, etc. Time series clustering methods are based on the calculation of suitable similarity measures which identify the distance between two or more time series. These measures are either computed in the time domain or in the spectral domain. Since the computation of time domain measures is rather cumbersome we resort to spectral domain methods. A new measure of distance is proposed and it is based on the so-called cepstral coefficients which carry information about the log spectrum of a stationary time series. These coefficients are estimated by means of a semiparametric model which assumes that the log-likelihood ratio of two or more unknown spectral densities has a linear parametric form. After estimation, the estimated cepstral distance measure is given as an input to a clustering method to produce the disjoint groups of data. Simulated examples show that the method yields good results, even when the processes are not necessarily linear. These cepstral-based clustering algorithms are applied to biological time series. In particular, the proposed methodology effectively identifies distinct and biologically relevant classes of amino acid sequences with the same physicochemical properties, such as hydrophobicity.",

keywords = "Exponential model, Likelihood, Distance measures, Spectral analysis, Periodogram, Data mining, Protein sequence analysis",

author = "A. Savvides and V.J. Promponas and K. Fokianos",

year = "2008",

month = jul,

doi = "10.1016/j.patcog.2008.01.002",

language = "English",

volume = "41",

pages = "2398--2412",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd",

number = "7",

}

RIS

TY - JOUR

T1 - Clustering of biological time series by cepstral coefficients based distances

AU - Savvides, A.

AU - Promponas, V.J.

AU - Fokianos, K.

PY - 2008/7

Y1 - 2008/7

N2 - Clustering of stationary time series has become an important tool in many scientific applications, like medicine, finance, etc. Time series clustering methods are based on the calculation of suitable similarity measures which identify the distance between two or more time series. These measures are either computed in the time domain or in the spectral domain. Since the computation of time domain measures is rather cumbersome we resort to spectral domain methods. A new measure of distance is proposed and it is based on the so-called cepstral coefficients which carry information about the log spectrum of a stationary time series. These coefficients are estimated by means of a semiparametric model which assumes that the log-likelihood ratio of two or more unknown spectral densities has a linear parametric form. After estimation, the estimated cepstral distance measure is given as an input to a clustering method to produce the disjoint groups of data. Simulated examples show that the method yields good results, even when the processes are not necessarily linear. These cepstral-based clustering algorithms are applied to biological time series. In particular, the proposed methodology effectively identifies distinct and biologically relevant classes of amino acid sequences with the same physicochemical properties, such as hydrophobicity.

AB - Clustering of stationary time series has become an important tool in many scientific applications, like medicine, finance, etc. Time series clustering methods are based on the calculation of suitable similarity measures which identify the distance between two or more time series. These measures are either computed in the time domain or in the spectral domain. Since the computation of time domain measures is rather cumbersome we resort to spectral domain methods. A new measure of distance is proposed and it is based on the so-called cepstral coefficients which carry information about the log spectrum of a stationary time series. These coefficients are estimated by means of a semiparametric model which assumes that the log-likelihood ratio of two or more unknown spectral densities has a linear parametric form. After estimation, the estimated cepstral distance measure is given as an input to a clustering method to produce the disjoint groups of data. Simulated examples show that the method yields good results, even when the processes are not necessarily linear. These cepstral-based clustering algorithms are applied to biological time series. In particular, the proposed methodology effectively identifies distinct and biologically relevant classes of amino acid sequences with the same physicochemical properties, such as hydrophobicity.

KW - Exponential model

KW - Likelihood

KW - Distance measures

KW - Spectral analysis

KW - Periodogram

KW - Data mining

KW - Protein sequence analysis

U2 - 10.1016/j.patcog.2008.01.002

DO - 10.1016/j.patcog.2008.01.002

M3 - Journal article

VL - 41

SP - 2398

EP - 2412

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

IS - 7

ER -

Research

Links

Text available via DOI:

Keywords