Sampling from a posterior distribution using Markov Chain Monte Carlo (MCMC) methods is a central task in computational statistics with numerous applicatons in science and engineering. It can be a particularly challenging problem when the evaluation of the posterior distribution is computationally expensive and the parameter space and data are high-dimensional.
This project has developed and implemented a new MCMC method called Multi-Level Delayed Acceptence (MLDA) and embedded the code into the widely used PyMC3 probabilistic programming package (PPL). MLDA is designed for applications where we have access to a multilevel model hierarchy, i.e. we not only have the desired expensive posterior but also a set of approximate posteriors of decreasing computational cost.
MLDA can drastically accelerate inference in many real-world applications where spatial or temporal granularity can be adjusted to create multiple model levels, e.g. Bayesian inverse problems, image reconstruction, probabilistic machine learning, subsurface fluid transportation, predator-prey models.
Embedding MLDA into PyMC3 means that users can take advantage of the powerful PyMC3 API to define models. It also means that MLDA lives within a well maintained open-source framework supported by a large community.
Explaining the science
Sampling from a Bayesian posterior distribution using Markov Chain Markov Chain Monte Carlo (MCMC) methods (Fig. 1) can be a particularly challenging problem when the evaluation of the posterior is computationally expensive and the parameter space and data are high-dimensional. Many MCMC samples are required to obtain a sufficient representation of the posterior. Examples of such problems frequently occur in Bayesian inverse problems, image reconstruction and probabilistic machine learning, where calculating the likelihood depends on the evaluation of complex mathematical models (e.g. systems of partial differential equations) or large data sets.
Fig.1 - An MCMC algorithm drawing samples from a 2D distribution. In each step a proposed sample is generated and it is either accepted (blue) or rejected (green). If rejected, the previous sample of the chain is replicated. Samples occur more frequently in areas of high probability.
This project has developed and implemented an MCMC approach (MLDA) capable of accelerating existing sampling methods, where a hierarchy (or sequence) of computationally cheaper approximations to the 'full' posterior density are available. The idea is to use samples generated in one level as proposals for the level above. A subchain runs for a fixed number of iterations and then the last sample of the subchain is used as the proposal for the higher-level chain (Fig. 2).
Fig .2 - Three-level MLDA: For each step of the fine chain, a coarse subchain is created with length 3. The last sample of the subchain is used as a proposal for the fine chain.
The sampler is suitable for situations where a model is expensive to evaluate in high spatial or temporal resolution (e.g. realistic subsurface flow models). In those cases it is possible to use cheaper, lower resolutions as coarse models. If the approximations are sufficiently good, this leads to good quality proposals, high acceptance rates and high effective sample size compared to other methods.
In addition to the above main idea, two enhancements to increase efficiency have been developed. The first one is an adaptive error model (AEM). If a coarse model is a poor approximation of the fine model, subsampling on the coarse chain will cause MLDA to diverge from the true posterior, leading to low acceptance rates and effective sample size. The AEM corrects coarse model proposals employing a Gaussian error model, which effectively offsets, scales and rotates the coarse level likelihood function (Fig. 3). In practice, it estimates the relative error (or bias) of the coarser model for each pair of adjacent model levels, and corrects the output of the coarser model according to the bias.
Fig 3 - Adaptive Error Model (AEM): When the coarse likelihood function (red isolines) is a poor fit of the fine (blue contours), AEM offsets the coarse likelihood function (middle panel) using the mean of the bias, and scales and rotates it (right panel) using the covariance of the bias.
The second feature is a new Variance Reduction technique, which allows the user to specify a Quantity of Interest (QoI) before sampling, and calculates this QoI for each step at every level, as well as differences of QoI between levels. The set of QoI and QoI differences can be used to reduce the variance of the final QoI estimator.
- Develop and implement a new MCMC method (called MLDA) that outperforms existing methods when sampling a Bayesian posterior distribution in multilevel problems.
- Develop and implement enhancements to core MLDA, which include an adaptive error model for correcting the bias between fine and coarse models and a variance reduction technique that allows us to use samples from all model levels to reduce the variance of output estimates.
- Make MLDA available to the community by contributing to the widely used PyMC3. Python package and offering examples and documentation.
- Communicate the results via peer-reviewed journal publications and conference presentations.
To demonstrate the new sampler, a set of Jupyter Notebooks have been included in the PyMC3 code. These range from a simple linear regression problem to three inverse problems in ground water flow, ecology modelling and geological surveying. Click here to learn more.
MLDA is applicable in any problem where a multilevel hierarchy of models can be defined. This includes many Bayesian inverse problems like impedance imaging, ultrasound imaging, emission tomography, flow field of ocean circulation, subsurface fluid transportation, predator-prey models in ecology.
It is particularly effective in cases where the target model's likelihood is computationally expensive. In those cases, a significantly cheaper and coarser approximate model can be exploited by MLDA to produce the same posterior exploration with much fewer evaluations of the fine likelihood. Since PDEs are usually solved on some form of mesh, different levels of mesh coarseness can serve as a natural platform for establishing a model hierarchy.
Finally, MLDA can use any existing MCMC method as a base sampler, making it possible to apply existing non-hierarchical MCMC methods to a hierarchy of models.