Merging MCMC subposteriors through Gaussian-Process Approximations

School Of Mathematical Sciences

Associated organisational units

Text available via DOI:

https://doi.org/10.1214/17-BA1063
Final published version

Keywords

stat.CO, stat.ML, Big data, Markov chain Monte Carlo, Gaussian processes, distributed importance sampling

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Merging MCMC subposteriors through Gaussian-Process Approximations. / Nemeth, Christopher ; Sherlock, Christopher Gerrard.
In: Bayesian Analysis, Vol. 13, No. 2, 03.2018, p. 507-530.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Bibtex

@article{ebdb2177e94d4a81947ae2fe63c500b3,

title = "Merging MCMC subposteriors through Gaussian-Process Approximations",

abstract = "Markov chain Monte Carlo (MCMC) algorithms have become powerful tools for Bayesian inference. However, they do not scale well to large-data problems. Divide-and-conquer strategies, which split the data into batches and, for each batch, run independent MCMC algorithms targeting the corresponding subposterior, can spread the computational burden across a number of separate computer cores. The challenge with such strategies is in recombining the subposteriors to approximate the full posterior. By creating a Gaussian-process approximation for each log-subposterior density we create a tractable approximation for the full posterior. This approximation is exploited through three methodologies: firstly a Hamiltonian Monte Carlo algorithm targeting the expectation of the posterior density provides a sample from an approximation to the posterior; secondly,evaluating the true posterior at the sampled points leads to an importance sampler that, asymptotically, targets the true posterior expectations; finally, an alternative importance sampler uses the full Gaussian-process distribution of the approximation to the log-posterior density to re-weight any initial sample and provide both an estimate of the posterior expectation and a measure of the uncertainty in it.",

keywords = "stat.CO, stat.ML, Big data, Markov chain Monte Carlo, Gaussian processes, distributed importance sampling",

author = "Christopher Nemeth and Sherlock, {Christopher Gerrard}",

year = "2018",

month = mar,

doi = "10.1214/17-BA1063",

language = "English",

volume = "13",

pages = "507--530",

journal = "Bayesian Analysis",

issn = "1936-0975",

publisher = "Carnegie Mellon University",

number = "2",

}

RIS

TY - JOUR

T1 - Merging MCMC subposteriors through Gaussian-Process Approximations

AU - Nemeth, Christopher

AU - Sherlock, Christopher Gerrard

PY - 2018/3

Y1 - 2018/3

N2 - Markov chain Monte Carlo (MCMC) algorithms have become powerful tools for Bayesian inference. However, they do not scale well to large-data problems. Divide-and-conquer strategies, which split the data into batches and, for each batch, run independent MCMC algorithms targeting the corresponding subposterior, can spread the computational burden across a number of separate computer cores. The challenge with such strategies is in recombining the subposteriors to approximate the full posterior. By creating a Gaussian-process approximation for each log-subposterior density we create a tractable approximation for the full posterior. This approximation is exploited through three methodologies: firstly a Hamiltonian Monte Carlo algorithm targeting the expectation of the posterior density provides a sample from an approximation to the posterior; secondly,evaluating the true posterior at the sampled points leads to an importance sampler that, asymptotically, targets the true posterior expectations; finally, an alternative importance sampler uses the full Gaussian-process distribution of the approximation to the log-posterior density to re-weight any initial sample and provide both an estimate of the posterior expectation and a measure of the uncertainty in it.

AB - Markov chain Monte Carlo (MCMC) algorithms have become powerful tools for Bayesian inference. However, they do not scale well to large-data problems. Divide-and-conquer strategies, which split the data into batches and, for each batch, run independent MCMC algorithms targeting the corresponding subposterior, can spread the computational burden across a number of separate computer cores. The challenge with such strategies is in recombining the subposteriors to approximate the full posterior. By creating a Gaussian-process approximation for each log-subposterior density we create a tractable approximation for the full posterior. This approximation is exploited through three methodologies: firstly a Hamiltonian Monte Carlo algorithm targeting the expectation of the posterior density provides a sample from an approximation to the posterior; secondly,evaluating the true posterior at the sampled points leads to an importance sampler that, asymptotically, targets the true posterior expectations; finally, an alternative importance sampler uses the full Gaussian-process distribution of the approximation to the log-posterior density to re-weight any initial sample and provide both an estimate of the posterior expectation and a measure of the uncertainty in it.

KW - stat.CO

KW - stat.ML

KW - Big data

KW - Markov chain Monte Carlo

KW - Gaussian processes

KW - distributed importance sampling

U2 - 10.1214/17-BA1063

DO - 10.1214/17-BA1063

M3 - Journal article

VL - 13

SP - 507

EP - 530

JO - Bayesian Analysis

JF - Bayesian Analysis

SN - 1936-0975

IS - 2

ER -

Research

Associated organisational units

Links

Text available via DOI:

Keywords