Semi-automatic selection of summary statistics for ABC model choice

School Of Mathematical Sciences

Associated organisational units

Text available via DOI:

https://doi.org/10.1515/sagmb-2013-0012
Final published version

Keywords

ABC, model selection, sufficiency, Campylobacter, coalescent

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Semi-automatic selection of summary statistics for ABC model choice. / Prangle, Dennis; Fearnhead, Paul; Cox, Murray et al.
In: Statistical Applications in Genetics and Molecular Biology, Vol. 13, No. 1, 02.2014, p. 67-82.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Prangle, D, Fearnhead, P, Cox, M, Biggs, P & French, N 2014, 'Semi-automatic selection of summary statistics for ABC model choice', Statistical Applications in Genetics and Molecular Biology, vol. 13, no. 1, pp. 67-82. https://doi.org/10.1515/sagmb-2013-0012

APA

Prangle, D., Fearnhead, P., Cox, M., Biggs, P., & French, N. (2014). Semi-automatic selection of summary statistics for ABC model choice. Statistical Applications in Genetics and Molecular Biology, 13(1), 67-82. https://doi.org/10.1515/sagmb-2013-0012

Vancouver

Prangle D, Fearnhead P, Cox M, Biggs P, French N. Semi-automatic selection of summary statistics for ABC model choice. Statistical Applications in Genetics and Molecular Biology. 2014 Feb;13(1):67-82. doi: 10.1515/sagmb-2013-0012

Author

Prangle, Dennis ; Fearnhead, Paul ; Cox, Murray et al. / Semi-automatic selection of summary statistics for ABC model choice. In: Statistical Applications in Genetics and Molecular Biology. 2014 ; Vol. 13, No. 1. pp. 67-82.

Bibtex

@article{41bafc07786c429cb8d5f70d24f9d680,

title = "Semi-automatic selection of summary statistics for ABC model choice",

abstract = "A central statistical goal is to choose between alternative explanatory models of data. In many modern applications, such as population genetics, it is not possible to apply standard methods based on evaluating the likelihood functions of the models, as these are numerically intractable. Approximate Bayesian computation (ABC) is a commonly used alternative for such situations. ABC simulates data x for many parameter values under each model, which is compared to the observed data xobs. More weight is placed on models under which S(x) is close to S(xobs), where S maps data to a vector of summary statistics. Previous work has shown the choice of S is crucial to the efficiency and accuracy of ABC. This paper provides a method to select good summary statistics for model choice. It uses a preliminary step, simulating many x values from all models and fitting regressions to this with the model as response. The resulting model weight estimators are used as S in an ABC analysis. Theoretical results are given to justify this as approximating low dimensional sufficient statistics. A substantive application is presented: choosing between competing coalescent models of demographic growth for Campylobacter jejuni in New Zealand using multi-locus sequence typing data.",

keywords = "ABC, model selection, sufficiency, Campylobacter, coalescent",

author = "Dennis Prangle and Paul Fearnhead and Murray Cox and Patrick Biggs and Nigel French",

year = "2014",

month = feb,

doi = "10.1515/sagmb-2013-0012",

language = "English",

volume = "13",

pages = "67--82",

journal = "Statistical Applications in Genetics and Molecular Biology",

issn = "2194-6302",

publisher = "Berkeley Electronic Press",

number = "1",

}

RIS

TY - JOUR

T1 - Semi-automatic selection of summary statistics for ABC model choice

AU - Prangle, Dennis

AU - Fearnhead, Paul

AU - Cox, Murray

AU - Biggs, Patrick

AU - French, Nigel

PY - 2014/2

Y1 - 2014/2

N2 - A central statistical goal is to choose between alternative explanatory models of data. In many modern applications, such as population genetics, it is not possible to apply standard methods based on evaluating the likelihood functions of the models, as these are numerically intractable. Approximate Bayesian computation (ABC) is a commonly used alternative for such situations. ABC simulates data x for many parameter values under each model, which is compared to the observed data xobs. More weight is placed on models under which S(x) is close to S(xobs), where S maps data to a vector of summary statistics. Previous work has shown the choice of S is crucial to the efficiency and accuracy of ABC. This paper provides a method to select good summary statistics for model choice. It uses a preliminary step, simulating many x values from all models and fitting regressions to this with the model as response. The resulting model weight estimators are used as S in an ABC analysis. Theoretical results are given to justify this as approximating low dimensional sufficient statistics. A substantive application is presented: choosing between competing coalescent models of demographic growth for Campylobacter jejuni in New Zealand using multi-locus sequence typing data.

AB - A central statistical goal is to choose between alternative explanatory models of data. In many modern applications, such as population genetics, it is not possible to apply standard methods based on evaluating the likelihood functions of the models, as these are numerically intractable. Approximate Bayesian computation (ABC) is a commonly used alternative for such situations. ABC simulates data x for many parameter values under each model, which is compared to the observed data xobs. More weight is placed on models under which S(x) is close to S(xobs), where S maps data to a vector of summary statistics. Previous work has shown the choice of S is crucial to the efficiency and accuracy of ABC. This paper provides a method to select good summary statistics for model choice. It uses a preliminary step, simulating many x values from all models and fitting regressions to this with the model as response. The resulting model weight estimators are used as S in an ABC analysis. Theoretical results are given to justify this as approximating low dimensional sufficient statistics. A substantive application is presented: choosing between competing coalescent models of demographic growth for Campylobacter jejuni in New Zealand using multi-locus sequence typing data.

KW - ABC

KW - model selection

KW - sufficiency

KW - Campylobacter

KW - coalescent

U2 - 10.1515/sagmb-2013-0012

DO - 10.1515/sagmb-2013-0012

M3 - Journal article

VL - 13

SP - 67

EP - 82

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

SN - 2194-6302

IS - 1

ER -

Research

Associated organisational units

Links

Text available via DOI:

Keywords