Rights statement: This is the author’s version of a work that was accepted for publication in American Journal of Human Genetics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in American Journal of Human Genetics, 97, 4, 2015 DOI: 10.1016/j.ajhg.2015.09.001
Accepted author manuscript, 613 KB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Modeling linkage disequilibrium increases accuracy of polygenic risk scores
AU - Vilhjálmsson, Bjarni J.
AU - Yang, Jian
AU - Finucane, Hilary K.
AU - Gusev, Alexander
AU - Lindström, Sara
AU - Ripke, Stephan
AU - Genovese, Giulio
AU - Loh, Po-Ru
AU - Bhatia, Gaurav
AU - Do, Ron
AU - Hayeck, Tristan
AU - Won, Hong-Hee
AU - Kathiresan, Sekar
AU - Pato, Michele
AU - Pato, Carlos
AU - Tamimi, Rulla
AU - Stahl, Eli
AU - Zaitlen, Noah
AU - Pasaniuc, Bogdan
AU - Belbin, Gillian
AU - Kenny, Eimear E.
AU - Schierup, Mikkel H.
AU - De Jager, Philip
AU - Patsopoulos, Nikolaos A.
AU - McCarroll, Steve
AU - Daly, Mark
AU - Purcell, Shaun
AU - Chasman, Daniel
AU - Neale, Benjamin
AU - Goddard, Michael
AU - Visscher, Peter M.
AU - Kraft, Peter
AU - Patterson, Nick
AU - Price, Alkes L.
AU - Knight, Jo
AU - Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study
N1 - This is the author’s version of a work that was accepted for publication in American Journal of Human Genetics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in American Journal of Human Genetics, 97, 4, 2015 DOI: 10.1016/j.ajhg.2015.09.001
PY - 2015/10/1
Y1 - 2015/10/1
N2 - Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
AB - Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
KW - Genome-Wide Association Study
KW - Genotype
KW - Humans
KW - Linkage Disequilibrium
KW - Models, Theoretical
KW - Multifactorial Inheritance
KW - Multiple Sclerosis
KW - Phenotype
KW - Polymorphism, Single Nucleotide
KW - Prognosis
KW - Quantitative Trait Loci
KW - Schizophrenia
U2 - 10.1016/j.ajhg.2015.09.001
DO - 10.1016/j.ajhg.2015.09.001
M3 - Journal article
C2 - 26430803
VL - 97
SP - 576
EP - 592
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
SN - 0002-9297
IS - 4
ER -