Posterior bootstrap: A scalable approach to Bayesian non-parametric learning

Providing an open-source tool to make non-parametric statistical inference faster and more accurate

Project status

Finished

Introduction

Statistical sampling in a Bayesian context has a speed-accuracy tradeoff: methods to sample from the exact statistical model are slow because they cannot be parallelised; the methods that can be parallelised are fast but only approximate. This project bridges the gap and implements a method that can tune the dial between speed and accuracy. It takes the results from the approximate method and performs statistical sampling that is as close to the exact model as desired, all the while with fast and parallel processing.

Explaining the science

Bayesian methods often use 'Markov Chain-Monte Carlo' theory to obtain a posterior sample (the starting sample at each pass through a chain) of the parameters of interest. This chain is sequential and thus cannot benefit from the parallelisation of modern computers. 'Variational Bayes' uses a similar approach but assumes that the parameters of interest are uncorrelated, and thus can sample them faster.

This project sidesteps this sequential paradigm and draws samples from the parameters from a 'frequentist regression' where the data points are a combination of the observed data provided and synthetic data generated from the parameter estimates that needing correction (if any). The weightings given to the data and synthetic data governs the proximity of the sampled parameters to the observed data or to the previous model. Since each sample of the parameters is the result of a frequentist regression, the approach scales well: for example, if the user wants 1 thousand samples and has 1 thousand processors available, each processor can produce one sample very quickly.

Project aims

The project is developing an open-source package in the programming language R that implements the method. It will be available on CRAN, so any researcher can easily replicate the results and benefit from parallel statistical sampling with adjustable proximity to the exact model.

Applications

This methodology can be applied in a number of situations, including:

  • Direct updating from utility-functions in health data: where the modeller wants to perform some action or take a decision under a well-specified utility function.
  • Model misspecification in finance: researchers have used a parametric Bayesian model and want to correct for the bias in model misspecification.
  • Approximate posteriors in meteorology: where for expediency researchers have used an approximate posterior, such as in variational Bayes (VB), and wish to account for the approximation.

 

Researchers and collaborators

Funders