Modeling linkage disequilibrium increases accuracy of polygenic risk scores

Home > Research > Publications & Outputs > Modeling linkage disequilibrium increases accur...

Data Science Institute

Associated organisational units

Electronic data

final
Rights statement: This is the author’s version of a work that was accepted for publication in American Journal of Human Genetics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in American Journal of Human Genetics, 97, 4, 2015 DOI: 10.1016/j.ajhg.2015.09.001
Accepted author manuscript, 613 KB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

https://doi.org/10.1016/j.ajhg.2015.09.001
Final published version

Keywords

Genome-Wide Association Study, Genotype, Humans, Linkage Disequilibrium, Models, Theoretical, Multifactorial Inheritance, Multiple Sclerosis, Phenotype, Polymorphism, Single Nucleotide, Prognosis, Quantitative Trait Loci, Schizophrenia

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study

More...

<mark>Journal publication date</mark>	1/10/2015
<mark>Journal</mark>	American Journal of Human Genetics
Issue number	4
Volume	97
Number of pages	17
Pages (from-to)	576-592
Publication Status	Published
<mark>Original language</mark>	English

Abstract

Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

Bibliographic note

This is the author’s version of a work that was accepted for publication in American Journal of Human Genetics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in American Journal of Human Genetics, 97, 4, 2015 DOI: 10.1016/j.ajhg.2015.09.001

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords