Each advance in methods for describing the parameters of models using MCMC algorithms (Markov Chain Monte Carlo) have reduced the effectiveness of their visualisation. This project is developing visualisation tools which are designed for multiple chain, parameter rich models that produce vast numbers of samples. By incorporating concepts such as ‘mixing’ and those of statistical diagnostics into the visualisations’ designs, the project aims to increase data scientists’ ability to interrogate complex models.
Explaining the science
An MCMC analysis may be run multiple times, producing thousands of samples for every parameter in the model. A single chain of MCMC samples can be very difficult to visualise as thousands of samples need to be displayed, producing dense plots that are difficult, if not impossible, to ‘read’.
Visualising many chains, across multiple parameters, has proved an even greater visualisation challenge. This project's designs are taking on the challenge of articulating statistical concepts in the visual design and emphasising diagnostic features that support the work of the diverse users who rely on MCMC techniques.
Following semi-structured interviews with MCMC users, and a review of currently available visualisation tools, the project has produced new designs that focus on interrogating individual parameter samples over multiples chains, and the visualisation of multiple parameters in terms of key concepts such as ‘mixing’ which defines how confident a data scientists can be in those samples. A key goal is to make designs that are worth using, but also that are worth keeping as these visualisations are a key step in model justification yet are usually unpublished.
The project aims to reconsider the design of MCMC visualisations and develop an easy-to-use R package.
Visualisations play a key role in interrogating MCMC analyses as there is no absolute diagnostic that confirms if the MCMC has settled on a robust result, or what the issues with the analysis are. Data scientists typically use an individualistic selection of statistical and visual diagnostics in order to understand the properties of the MCMC outputs, and to inform modelling decisions in terms of the MCMC settings, model structure and computational methods.
The short term aim is to allow MCMC modellers to see new dimensions of their models. In the longer term the project's researchers are keen to explore standardised outputs that support decision-making in modelling and exploit generalisable aspects of the designs produced.
MCMC is used in a wide variety of domains, spanning data science, finance, engineering and the life- and physical-sciences.
The designs in this project are not just useful for MCMC and may have applications in other optimisations/parametrisation settings where large numbers of samples are outputted.