Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters

Data Science Institute

Text available via DOI:

https://doi.org/10.1186/1471-2105-14-252
Final published version

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters. / Hensman, James; Lawrence, Neil D.; Rattray, Magnus.
In: BMC Bioinformatics, Vol. 14, No. 1, 252, 20.08.2013.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Bibtex

@article{b9d5d98e0d3048658baa22762d6be591,

title = "Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters",

abstract = "Background: Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications.Results: We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications.Conclusion: The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.",

author = "James Hensman and Lawrence, {Neil D.} and Magnus Rattray",

year = "2013",

month = aug,

day = "20",

doi = "10.1186/1471-2105-14-252",

language = "English",

volume = "14",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

number = "1",

}

RIS

TY - JOUR

T1 - Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters

AU - Hensman, James

AU - Lawrence, Neil D.

AU - Rattray, Magnus

PY - 2013/8/20

Y1 - 2013/8/20

N2 - Background: Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications.Results: We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications.Conclusion: The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

AB - Background: Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications.Results: We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications.Conclusion: The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

U2 - 10.1186/1471-2105-14-252

DO - 10.1186/1471-2105-14-252

M3 - Journal article

C2 - 23962281

AN - SCOPUS:84883634438

VL - 14

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 252

ER -

Research

Links

Text available via DOI: