Home > Research > Publications & Outputs > Investigation of the ability of haplotype assoc...
View graph of relations

Investigation of the ability of haplotype association and logistic regression to identify associated susceptibility loci

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Investigation of the ability of haplotype association and logistic regression to identify associated susceptibility loci. / North, Bernard V.; Sham, Pak C.; Knight, Jo et al.
In: Annals of Human Genetics, Vol. 70, No. 6, 11.2006, p. 893-906.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

North BV, Sham PC, Knight J, Martin ER, Curtis D. Investigation of the ability of haplotype association and logistic regression to identify associated susceptibility loci. Annals of Human Genetics. 2006 Nov;70(6):893-906. Epub 2006 Jul 25. doi: 10.1111/j.1469-1809.2006.00301.x

Author

North, Bernard V. ; Sham, Pak C. ; Knight, Jo et al. / Investigation of the ability of haplotype association and logistic regression to identify associated susceptibility loci. In: Annals of Human Genetics. 2006 ; Vol. 70, No. 6. pp. 893-906.

Bibtex

@article{68874d062acf47ccb0ae457373179c2a,
title = "Investigation of the ability of haplotype association and logistic regression to identify associated susceptibility loci",
abstract = "While finely spaced markers are increasingly being used in case-control association studies in attempts to identify susceptibility loci, not enough is yet known as to the optimal spacing of such markers, their likely power to detect association, the relative merits of single marker versus multimarker analysis, or which methods of analysis may be optimal. Some investigations of these issues have used markers simulated under different theoretical models of population evolution. However the HapMap project and other sources provide real datasets which can be used to obtain a more realistic view of the performance of these approaches. SNPs around APOE and from two HapMap regions were used to obtain information regarding linkage disequilibrium (LD) relationships between polymorphisms, and these real patterns of LD were used to simulate datasets such as would be obtained in case-control studies were these SNPs to influence susceptibility to disease. The datasets obtained were analysed using tests for heterogeneity of estimated haplotype frequencies and using logistic regression analyses in which only main effects from each marker were considered. All markers surrounding the putative susceptibility locus were analysed, using sets of either 1, 2, 3 or 4 markers at a time. Some markers within 150 kb of the susceptibility locus were able to detect association. At distances less than 100 kb there was no correlation between the distance from the susceptibility locus and the strength of evidence for association. When the average inter-locus spacing is 25 kb many loci would not be detected, while when the spacing is as low as 2 kb one can be fairly confident that at least one marker will be in strong enough LD with the susceptibility locus to enable association to be detected, if the susceptibility locus has a strong enough effect relative to the sample size. With an inter-locus spacing of 4 kb some susceptibility loci did not have a marker locus in strong LD, potentially undermining the ability to detect association. There was little difference in the performance of haplotype-based analysis compared with logistic regression considering effects of each marker as separate. Multimarker analysis on occasion produced results which were much more highly significant than single marker analysis, but only very rarely. Our results support the view that if markers are randomly selected then a spacing as low as 2 kb is desirable. Multimarker analysis can sometimes be more powerful than single marker analysis so both should be performed. However, because it is rare for multimarker analysis to be much more highly significant than single marker analysis one should strongly suspect that when such results occur they may be due to mistakes in genotyping or through some other artefact. Haplotype analysis may be more prone to such problems than logistic regression, suggesting that the latter method might be preferred.",
keywords = "Apolipoproteins E, Genetic Markers, Genetic Predisposition to Disease, Haplotypes, Humans, Likelihood Functions, Linkage Disequilibrium, Logistic Models, Polymorphism, Single Nucleotide",
author = "North, {Bernard V.} and Sham, {Pak C.} and Jo Knight and Martin, {E. R.} and David Curtis",
year = "2006",
month = nov,
doi = "10.1111/j.1469-1809.2006.00301.x",
language = "English",
volume = "70",
pages = "893--906",
journal = "Annals of Human Genetics",
issn = "0003-4800",
publisher = "Wiley-Blackwell",
number = "6",

}

RIS

TY - JOUR

T1 - Investigation of the ability of haplotype association and logistic regression to identify associated susceptibility loci

AU - North, Bernard V.

AU - Sham, Pak C.

AU - Knight, Jo

AU - Martin, E. R.

AU - Curtis, David

PY - 2006/11

Y1 - 2006/11

N2 - While finely spaced markers are increasingly being used in case-control association studies in attempts to identify susceptibility loci, not enough is yet known as to the optimal spacing of such markers, their likely power to detect association, the relative merits of single marker versus multimarker analysis, or which methods of analysis may be optimal. Some investigations of these issues have used markers simulated under different theoretical models of population evolution. However the HapMap project and other sources provide real datasets which can be used to obtain a more realistic view of the performance of these approaches. SNPs around APOE and from two HapMap regions were used to obtain information regarding linkage disequilibrium (LD) relationships between polymorphisms, and these real patterns of LD were used to simulate datasets such as would be obtained in case-control studies were these SNPs to influence susceptibility to disease. The datasets obtained were analysed using tests for heterogeneity of estimated haplotype frequencies and using logistic regression analyses in which only main effects from each marker were considered. All markers surrounding the putative susceptibility locus were analysed, using sets of either 1, 2, 3 or 4 markers at a time. Some markers within 150 kb of the susceptibility locus were able to detect association. At distances less than 100 kb there was no correlation between the distance from the susceptibility locus and the strength of evidence for association. When the average inter-locus spacing is 25 kb many loci would not be detected, while when the spacing is as low as 2 kb one can be fairly confident that at least one marker will be in strong enough LD with the susceptibility locus to enable association to be detected, if the susceptibility locus has a strong enough effect relative to the sample size. With an inter-locus spacing of 4 kb some susceptibility loci did not have a marker locus in strong LD, potentially undermining the ability to detect association. There was little difference in the performance of haplotype-based analysis compared with logistic regression considering effects of each marker as separate. Multimarker analysis on occasion produced results which were much more highly significant than single marker analysis, but only very rarely. Our results support the view that if markers are randomly selected then a spacing as low as 2 kb is desirable. Multimarker analysis can sometimes be more powerful than single marker analysis so both should be performed. However, because it is rare for multimarker analysis to be much more highly significant than single marker analysis one should strongly suspect that when such results occur they may be due to mistakes in genotyping or through some other artefact. Haplotype analysis may be more prone to such problems than logistic regression, suggesting that the latter method might be preferred.

AB - While finely spaced markers are increasingly being used in case-control association studies in attempts to identify susceptibility loci, not enough is yet known as to the optimal spacing of such markers, their likely power to detect association, the relative merits of single marker versus multimarker analysis, or which methods of analysis may be optimal. Some investigations of these issues have used markers simulated under different theoretical models of population evolution. However the HapMap project and other sources provide real datasets which can be used to obtain a more realistic view of the performance of these approaches. SNPs around APOE and from two HapMap regions were used to obtain information regarding linkage disequilibrium (LD) relationships between polymorphisms, and these real patterns of LD were used to simulate datasets such as would be obtained in case-control studies were these SNPs to influence susceptibility to disease. The datasets obtained were analysed using tests for heterogeneity of estimated haplotype frequencies and using logistic regression analyses in which only main effects from each marker were considered. All markers surrounding the putative susceptibility locus were analysed, using sets of either 1, 2, 3 or 4 markers at a time. Some markers within 150 kb of the susceptibility locus were able to detect association. At distances less than 100 kb there was no correlation between the distance from the susceptibility locus and the strength of evidence for association. When the average inter-locus spacing is 25 kb many loci would not be detected, while when the spacing is as low as 2 kb one can be fairly confident that at least one marker will be in strong enough LD with the susceptibility locus to enable association to be detected, if the susceptibility locus has a strong enough effect relative to the sample size. With an inter-locus spacing of 4 kb some susceptibility loci did not have a marker locus in strong LD, potentially undermining the ability to detect association. There was little difference in the performance of haplotype-based analysis compared with logistic regression considering effects of each marker as separate. Multimarker analysis on occasion produced results which were much more highly significant than single marker analysis, but only very rarely. Our results support the view that if markers are randomly selected then a spacing as low as 2 kb is desirable. Multimarker analysis can sometimes be more powerful than single marker analysis so both should be performed. However, because it is rare for multimarker analysis to be much more highly significant than single marker analysis one should strongly suspect that when such results occur they may be due to mistakes in genotyping or through some other artefact. Haplotype analysis may be more prone to such problems than logistic regression, suggesting that the latter method might be preferred.

KW - Apolipoproteins E

KW - Genetic Markers

KW - Genetic Predisposition to Disease

KW - Haplotypes

KW - Humans

KW - Likelihood Functions

KW - Linkage Disequilibrium

KW - Logistic Models

KW - Polymorphism, Single Nucleotide

U2 - 10.1111/j.1469-1809.2006.00301.x

DO - 10.1111/j.1469-1809.2006.00301.x

M3 - Journal article

C2 - 17044864

VL - 70

SP - 893

EP - 906

JO - Annals of Human Genetics

JF - Annals of Human Genetics

SN - 0003-4800

IS - 6

ER -