Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics

Data Science Institute

Associated organisational units

Text available via DOI:

https://doi.org/10.3389/fgene.2016.00015
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics. / Schizophrenia Working Group of the Psychiatric Genomics Consortium.
In: Frontiers in Genetics, Vol. 7, 15, 16.02.2016.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Schizophrenia Working Group of the Psychiatric Genomics Consortium 2016, 'Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics', Frontiers in Genetics, vol. 7, 15. https://doi.org/10.3389/fgene.2016.00015

APA

Schizophrenia Working Group of the Psychiatric Genomics Consortium (2016). Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics. Frontiers in Genetics, 7, Article 15. https://doi.org/10.3389/fgene.2016.00015

Vancouver

Schizophrenia Working Group of the Psychiatric Genomics Consortium. Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics. Frontiers in Genetics. 2016 Feb 16;7:15. doi: 10.3389/fgene.2016.00015

Author

Schizophrenia Working Group of the Psychiatric Genomics Consortium. / Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics. In: Frontiers in Genetics. 2016 ; Vol. 7.

Bibtex

@article{f3d4f75ad17843a98d6b923625182014,

title = "Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics",

abstract = "Genome-wide Association Studies (GWAS) result in millions of summary statistics ({"}z-scores{"}) for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 10(6) and 10(5). The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.",

author = "Dominic Holland and Yunpeng Wang and Thompson, {Wesley K} and Andrew Schork and Chi-Hua Chen and Min-Tzu Lo and Aree Witoelar and Thomas Werge and Michael O'Donovan and Andreassen, {Ole A} and Dale, {Anders M} and Jo Knight and {Schizophrenia Working Group of the Psychiatric Genomics Consortium}",

year = "2016",

month = feb,

day = "16",

doi = "10.3389/fgene.2016.00015",

language = "English",

volume = "7",

journal = "Frontiers in Genetics",

issn = "1664-8021",

publisher = "Frontiers Media S.A.",

}

RIS

TY - JOUR

T1 - Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics

AU - Holland, Dominic

AU - Wang, Yunpeng

AU - Thompson, Wesley K

AU - Schork, Andrew

AU - Chen, Chi-Hua

AU - Lo, Min-Tzu

AU - Witoelar, Aree

AU - Werge, Thomas

AU - O'Donovan, Michael

AU - Andreassen, Ole A

AU - Dale, Anders M

AU - Knight, Jo

AU - Schizophrenia Working Group of the Psychiatric Genomics Consortium

PY - 2016/2/16

Y1 - 2016/2/16

N2 - Genome-wide Association Studies (GWAS) result in millions of summary statistics ("z-scores") for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 10(6) and 10(5). The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.

AB - Genome-wide Association Studies (GWAS) result in millions of summary statistics ("z-scores") for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 10(6) and 10(5). The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.

U2 - 10.3389/fgene.2016.00015

DO - 10.3389/fgene.2016.00015

M3 - Journal article

C2 - 26909100

VL - 7

JO - Frontiers in Genetics

JF - Frontiers in Genetics

SN - 1664-8021

M1 - 15

ER -

Research

Associated organisational units

Links

Text available via DOI: