Optimal allocation of Monte Carlo simulations to multiple hypothesis tests

School Of Mathematical Sciences

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Optimal allocation of Monte Carlo simulations to multiple hypothesis tests. / Hahn, G.
In: Statistics and Computing, Vol. 30, No. 3, 01.05.2020, p. 571-586.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Hahn, G 2020, 'Optimal allocation of Monte Carlo simulations to multiple hypothesis tests', Statistics and Computing, vol. 30, no. 3, pp. 571-586. https://doi.org/10.1007/s11222-019-09906-9

APA

Hahn, G. (2020). Optimal allocation of Monte Carlo simulations to multiple hypothesis tests. Statistics and Computing, 30(3), 571-586. https://doi.org/10.1007/s11222-019-09906-9

Vancouver

Hahn G. Optimal allocation of Monte Carlo simulations to multiple hypothesis tests. Statistics and Computing. 2020 May 1;30(3):571-586. Epub 2019 Oct 5. doi: 10.1007/s11222-019-09906-9

Author

Hahn, G. / Optimal allocation of Monte Carlo simulations to multiple hypothesis tests. In: Statistics and Computing. 2020 ; Vol. 30, No. 3. pp. 571-586.

Bibtex

@article{52c3ad460eee4ed4ac93564de4828474,

title = "Optimal allocation of Monte Carlo simulations to multiple hypothesis tests",

abstract = "Multiple hypothesis tests are often carried out in practice using p-value estimates obtained with bootstrap or permutation tests since the analytical p-values underlying all hypotheses are usually unknown. This article considers the allocation of a pre-specified total number of Monte Carlo simulations K∈ N (i.e., permutations or draws from a bootstrap distribution) to a given number of m∈ N hypotheses in order to approximate their p-values p∈ [0 , 1] m in an optimal way, in the sense that the allocation minimises the total expected number of misclassified hypotheses. A misclassification occurs if a decision on a single hypothesis, obtained with an approximated p-value, differs from the one obtained if its p-value was known analytically. The contribution of this article is threefold: under the assumption that p is known and K∈ R, and using a normal approximation of the Binomial distribution, the optimal real-valued allocation of K simulations to m hypotheses is derived when correcting for multiplicity with the Bonferroni correction, both when computing the p-value estimates with or without a pseudo-count. Computational subtleties arising in the former case will be discussed. Second, with the help of an algorithm based on simulated annealing, empirical evidence is given that the optimal integer allocation is likely of the same form as the optimal real-valued allocation, and that both seem to coincide asympotically. Third, an empirical study on simulated and real data demonstrates that a recently proposed sampling algorithm based on Thompson sampling asympotically mimics the optimal (real-valued) allocation when the p-values are unknown and thus estimated at runtime. ",

keywords = "Bonferroni correction, Multiple testing, Monte Carlo simulation, Optimal allocation, Thompson sampling, QuickMMCTest",

author = "G. Hahn",

year = "2020",

month = may,

day = "1",

doi = "10.1007/s11222-019-09906-9",

language = "English",

volume = "30",

pages = "571--586",

journal = "Statistics and Computing",

issn = "0960-3174",

publisher = "Springer Netherlands",

number = "3",

}

RIS

TY - JOUR

T1 - Optimal allocation of Monte Carlo simulations to multiple hypothesis tests

AU - Hahn, G.

PY - 2020/5/1

Y1 - 2020/5/1

N2 - Multiple hypothesis tests are often carried out in practice using p-value estimates obtained with bootstrap or permutation tests since the analytical p-values underlying all hypotheses are usually unknown. This article considers the allocation of a pre-specified total number of Monte Carlo simulations K∈ N (i.e., permutations or draws from a bootstrap distribution) to a given number of m∈ N hypotheses in order to approximate their p-values p∈ [0 , 1] m in an optimal way, in the sense that the allocation minimises the total expected number of misclassified hypotheses. A misclassification occurs if a decision on a single hypothesis, obtained with an approximated p-value, differs from the one obtained if its p-value was known analytically. The contribution of this article is threefold: under the assumption that p is known and K∈ R, and using a normal approximation of the Binomial distribution, the optimal real-valued allocation of K simulations to m hypotheses is derived when correcting for multiplicity with the Bonferroni correction, both when computing the p-value estimates with or without a pseudo-count. Computational subtleties arising in the former case will be discussed. Second, with the help of an algorithm based on simulated annealing, empirical evidence is given that the optimal integer allocation is likely of the same form as the optimal real-valued allocation, and that both seem to coincide asympotically. Third, an empirical study on simulated and real data demonstrates that a recently proposed sampling algorithm based on Thompson sampling asympotically mimics the optimal (real-valued) allocation when the p-values are unknown and thus estimated at runtime.

AB - Multiple hypothesis tests are often carried out in practice using p-value estimates obtained with bootstrap or permutation tests since the analytical p-values underlying all hypotheses are usually unknown. This article considers the allocation of a pre-specified total number of Monte Carlo simulations K∈ N (i.e., permutations or draws from a bootstrap distribution) to a given number of m∈ N hypotheses in order to approximate their p-values p∈ [0 , 1] m in an optimal way, in the sense that the allocation minimises the total expected number of misclassified hypotheses. A misclassification occurs if a decision on a single hypothesis, obtained with an approximated p-value, differs from the one obtained if its p-value was known analytically. The contribution of this article is threefold: under the assumption that p is known and K∈ R, and using a normal approximation of the Binomial distribution, the optimal real-valued allocation of K simulations to m hypotheses is derived when correcting for multiplicity with the Bonferroni correction, both when computing the p-value estimates with or without a pseudo-count. Computational subtleties arising in the former case will be discussed. Second, with the help of an algorithm based on simulated annealing, empirical evidence is given that the optimal integer allocation is likely of the same form as the optimal real-valued allocation, and that both seem to coincide asympotically. Third, an empirical study on simulated and real data demonstrates that a recently proposed sampling algorithm based on Thompson sampling asympotically mimics the optimal (real-valued) allocation when the p-values are unknown and thus estimated at runtime.

KW - Bonferroni correction

KW - Multiple testing

KW - Monte Carlo simulation

KW - Optimal allocation

KW - Thompson sampling

KW - QuickMMCTest

U2 - 10.1007/s11222-019-09906-9

DO - 10.1007/s11222-019-09906-9

M3 - Journal article

VL - 30

SP - 571

EP - 586

JO - Statistics and Computing

JF - Statistics and Computing

SN - 0960-3174

IS - 3

ER -

Research

Associated organisational unit

Links

Text available via DOI:

Keywords