Introduction
Turing.jl is a general-purpose probabilistic programming language (PPL), written fully in Julia. It originated in the UK, and is approachable for researchers familiar with Matlab, Python and R. Turing.jl has been used extensively for applied data science applications, and is rapidly growing: it has been downloaded more than 16,000 times in 2022, 1,900 GitHub stars, and over 300 citations. In this project, we plan to extend Turing.jl to
- facilitate collaborative, multi-dataset analyses through compositionality and modularity
- enable Bayesian PPL-enabled analysis of huge data through embedding massive parallelisation;
- improve its usability through community consultation to inform improved application programming interface (API) design.
Project aims
Our programme of work is spread across three work packages (WPs):
- WP1: Compositional and modular probabilistic programming.
We plan to develop easy-to-use functionality in Turing.jl for composing sub-model architectures, and tools for probabilistic composition of multiple Turing.jl models. This will be for both composing sub-models architectures within a Turing.jl model, and for composing separate Turing.jl models.
- WP2: Scalability and speed improvements
We plan to transform the scalability and speed of Turing.jl by (A) improving automatic differentiation (AD) through integration with with EnzymeAD; and (B) augmenting inference with massively parallel sampling strategies (multithreading, distributed computing, GPU) for simulation-based inference, such as particle filtering and Markov chain Monte Carlo (MCMC).
- WP3: Community projects, usability and educational materials.
We plan to enhance user interface/experience/API and create secondary school- and undergraduate-level courses, building on curriculum development research with the Cambridge Mathematics Project, focusing on user accessibility and educational outreach. We will perform comprehensive user studies on Turing.jl language usability, improve syntactic and interactive features for Turing.jl, and develop intuitive tools and teaching methods for Bayesian statistics at the high school level.