Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Spectral density ratio based clustering methods for the binary segmentation of protein sequences
T2 - A comparative study
AU - Ioannou, A.
AU - Fokianos, K.
AU - Promponas, V.J.
PY - 2010/5
Y1 - 2010/5
N2 - We compare several spectral domain based clustering methods for partitioning protein sequence data. The main instrument for this exercise is the spectral density ratio model, which specifies that the logarithmic ratio of two or more unknown spectral density functions has a parametric linear combination of cosines. Maximum likelihood inference is worked out in detail and it is shown that its output yields several distance measures among independent stationary time series. These similarity indices are suitable for clustering time series data based on their second order properties. Other spectral domain based distances are investigated as well; and we compare all methods and distances to the problem of producing segmentations of bacterial outer membrane proteins consistent with their transmembrane topology. Protein sequences are transformed to time series data by employing numerical scales of physicochemical parameters. We also present interesting results on the prediction of transmembrane -strands, based on the clustering outcome, for a representative set of bacterial outer membrane proteins with given three-dimensional structure.
AB - We compare several spectral domain based clustering methods for partitioning protein sequence data. The main instrument for this exercise is the spectral density ratio model, which specifies that the logarithmic ratio of two or more unknown spectral density functions has a parametric linear combination of cosines. Maximum likelihood inference is worked out in detail and it is shown that its output yields several distance measures among independent stationary time series. These similarity indices are suitable for clustering time series data based on their second order properties. Other spectral domain based distances are investigated as well; and we compare all methods and distances to the problem of producing segmentations of bacterial outer membrane proteins consistent with their transmembrane topology. Protein sequences are transformed to time series data by employing numerical scales of physicochemical parameters. We also present interesting results on the prediction of transmembrane -strands, based on the clustering outcome, for a representative set of bacterial outer membrane proteins with given three-dimensional structure.
KW - Distance measures
KW - OMP topology prediction
KW - Physicochemical parameters
KW - Protein sequence segmentation
KW - Spectral analysis
KW - Periodogram
KW - Time series
U2 - 10.1016/j.biosystems.2010.02.008
DO - 10.1016/j.biosystems.2010.02.008
M3 - Journal article
VL - 100
SP - 132
EP - 143
JO - BioSystems
JF - BioSystems
SN - 0303-2647
IS - 2
ER -