- http://aje.oxfordjournals.org/content/180/1/111
Final published version

Licence: CC BY

Research output: Contribution to journal › Journal article

Published

**Lack of identification in semiparametric instrumental variable models with binary outcomes.** / Burgess, Stephen; Granell, Raquel; Palmer, Tom M.; Sterne, Jonathan A. C.; Didelez, Vanessa.

Research output: Contribution to journal › Journal article

Burgess, S, Granell, R, Palmer, TM, Sterne, JAC & Didelez, V 2014, 'Lack of identification in semiparametric instrumental variable models with binary outcomes', *American Journal of Epidemiology*, vol. 180, no. 1, pp. 111-119. https://doi.org/10.1093/aje/kwu107

Burgess, S., Granell, R., Palmer, T. M., Sterne, J. A. C., & Didelez, V. (2014). Lack of identification in semiparametric instrumental variable models with binary outcomes. *American Journal of Epidemiology*, *180*(1), 111-119. https://doi.org/10.1093/aje/kwu107

Burgess S, Granell R, Palmer TM, Sterne JAC, Didelez V. Lack of identification in semiparametric instrumental variable models with binary outcomes. American Journal of Epidemiology. 2014 Jul 1;180(1):111-119. https://doi.org/10.1093/aje/kwu107

@article{280d07fd7adc43fda3dfd7e52c139708,

title = "Lack of identification in semiparametric instrumental variable models with binary outcomes",

abstract = "A parameter in a statistical model is identified if its value can be uniquely determined from the distribution of the observable data. We consider the context of an instrumental variable analysis with a binary outcome for estimating a causal risk ratio. The semiparametric generalized method of moments and structural mean model frameworks use estimating equations for parameter estimation. In this paper, we demonstrate that lack of identification can occur in either of these frameworks, especially if the instrument is weak. In particular, the estimating equations may have no solution or multiple solutions. We investigate the relationship between the strength of the instrument and the proportion of simulated data sets for which there is a unique solution of the estimating equations. We see that this proportion does not appear to depend greatly on the sample size, particularly for weak instruments (ρ(2) ≤ 0.01). Poor identification was observed in a considerable proportion of simulated data sets for instruments explaining up to 10% of the variance in the exposure with sample sizes up to 1 million. In an applied example considering the causal effect of body mass index (weight (kg)/height (m)(2)) on the probability of early menarche, estimates and standard errors from an automated optimization routine were misleading.",

keywords = "Adolescent, Age Factors, Asthma, Body Mass Index, Causality, Child, Data Interpretation, Statistical, Female, Humans, Menarche, Models, Statistical, Odds Ratio, Sample Size",

author = "Stephen Burgess and Raquel Granell and Palmer, {Tom M.} and Sterne, {Jonathan A. C.} and Vanessa Didelez",

note = "{\textcopyright} The Author 2014. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health.",

year = "2014",

month = jul

day = "1",

doi = "10.1093/aje/kwu107",

language = "English",

volume = "180",

pages = "111--119",

journal = "American Journal of Epidemiology",

issn = "0002-9262",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - Lack of identification in semiparametric instrumental variable models with binary outcomes

AU - Burgess, Stephen

AU - Granell, Raquel

AU - Palmer, Tom M.

AU - Sterne, Jonathan A. C.

AU - Didelez, Vanessa

N1 - © The Author 2014. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health.

PY - 2014/7/1

Y1 - 2014/7/1

N2 - A parameter in a statistical model is identified if its value can be uniquely determined from the distribution of the observable data. We consider the context of an instrumental variable analysis with a binary outcome for estimating a causal risk ratio. The semiparametric generalized method of moments and structural mean model frameworks use estimating equations for parameter estimation. In this paper, we demonstrate that lack of identification can occur in either of these frameworks, especially if the instrument is weak. In particular, the estimating equations may have no solution or multiple solutions. We investigate the relationship between the strength of the instrument and the proportion of simulated data sets for which there is a unique solution of the estimating equations. We see that this proportion does not appear to depend greatly on the sample size, particularly for weak instruments (ρ(2) ≤ 0.01). Poor identification was observed in a considerable proportion of simulated data sets for instruments explaining up to 10% of the variance in the exposure with sample sizes up to 1 million. In an applied example considering the causal effect of body mass index (weight (kg)/height (m)(2)) on the probability of early menarche, estimates and standard errors from an automated optimization routine were misleading.

AB - A parameter in a statistical model is identified if its value can be uniquely determined from the distribution of the observable data. We consider the context of an instrumental variable analysis with a binary outcome for estimating a causal risk ratio. The semiparametric generalized method of moments and structural mean model frameworks use estimating equations for parameter estimation. In this paper, we demonstrate that lack of identification can occur in either of these frameworks, especially if the instrument is weak. In particular, the estimating equations may have no solution or multiple solutions. We investigate the relationship between the strength of the instrument and the proportion of simulated data sets for which there is a unique solution of the estimating equations. We see that this proportion does not appear to depend greatly on the sample size, particularly for weak instruments (ρ(2) ≤ 0.01). Poor identification was observed in a considerable proportion of simulated data sets for instruments explaining up to 10% of the variance in the exposure with sample sizes up to 1 million. In an applied example considering the causal effect of body mass index (weight (kg)/height (m)(2)) on the probability of early menarche, estimates and standard errors from an automated optimization routine were misleading.

KW - Adolescent

KW - Age Factors

KW - Asthma

KW - Body Mass Index

KW - Causality

KW - Child

KW - Data Interpretation, Statistical

KW - Female

KW - Humans

KW - Menarche

KW - Models, Statistical

KW - Odds Ratio

KW - Sample Size

U2 - 10.1093/aje/kwu107

DO - 10.1093/aje/kwu107

M3 - Journal article

C2 - 24859275

VL - 180

SP - 111

EP - 119

JO - American Journal of Epidemiology

JF - American Journal of Epidemiology

SN - 0002-9262

IS - 1

ER -