Home > Research > Publications & Outputs > Smoking gun or circumstantial evidence?

Associated organisational unit

Electronic data

  • srep13373

    Rights statement: This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

    Final published version, 665 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

Smoking gun or circumstantial evidence?: comparison of statistical learning methods using functional annotations for prioritizing risk variants

Research output: Contribution to journalJournal articlepeer-review

Published

Standard

Smoking gun or circumstantial evidence? comparison of statistical learning methods using functional annotations for prioritizing risk variants. / Gagliano, Sarah A.; Ravji, Reena; Barnes, Michael R.; Weale, Michael E.; Knight, Jo.

In: Scientific Reports, Vol. 5, 13373, 24.08.2015.

Research output: Contribution to journalJournal articlepeer-review

Harvard

APA

Vancouver

Author

Gagliano, Sarah A. ; Ravji, Reena ; Barnes, Michael R. ; Weale, Michael E. ; Knight, Jo. / Smoking gun or circumstantial evidence? comparison of statistical learning methods using functional annotations for prioritizing risk variants. In: Scientific Reports. 2015 ; Vol. 5.

Bibtex

@article{b13dc34e64da47c6abb8893a48376079,
title = "Smoking gun or circumstantial evidence?: comparison of statistical learning methods using functional annotations for prioritizing risk variants",
abstract = "Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.",
author = "Gagliano, {Sarah A.} and Reena Ravji and Barnes, {Michael R.} and Weale, {Michael E.} and Jo Knight",
note = "This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article{\textquoteright}s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/",
year = "2015",
month = aug,
day = "24",
doi = "10.1038/srep13373",
language = "English",
volume = "5",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",

}

RIS

TY - JOUR

T1 - Smoking gun or circumstantial evidence?

T2 - comparison of statistical learning methods using functional annotations for prioritizing risk variants

AU - Gagliano, Sarah A.

AU - Ravji, Reena

AU - Barnes, Michael R.

AU - Weale, Michael E.

AU - Knight, Jo

N1 - This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

PY - 2015/8/24

Y1 - 2015/8/24

N2 - Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.

AB - Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.

U2 - 10.1038/srep13373

DO - 10.1038/srep13373

M3 - Journal article

C2 - 26300220

VL - 5

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

M1 - 13373

ER -