Introduction

The Turing-RSS Health Data Lab is a working partnership between The Alan Turing Institute and Royal Statistical Society (RSS) supporting the UK Health Security Agency (UKHSA).

We provide an independent source of statistical modelling and machine learning expertise to address policy-relevant research questions.

Established in August 2020, The Turing-RSS Health Data Lab has focused on conducting world leading statistical research to meet the needs of current and future health surveillance systems.

Projects to date have included investigating social inequalities in COVID-19 risk, methods for de-biasing routine testing data, transmission and mobility modelling, the use of wastewater as a biomarker for local prevalence, COVID-19 genomics, and a rigorous assessment of a biomedical acoustic marker as a COVID-19 diagnostic.

This collaboration bridges the gap between rapid-response analysis and longer-term research projects whilst establishing an innovative interoperable modelling approach that allows our methods and algorithms to be transferable, sustainable and re-useable in future projects. 

Visit our 'project overviews' page for a jargon-free review of the projects listed above


Published Papers

Bayesian imputation of COVID-19 positive test counts for nowcasting under reporting lag

Spatial and temporal modelling of incidence and prevalence of COVID-19

Estimating COVID-19 prevalence and transmission from multiple sources: de-biasing Pillar 2 data  

Interoperability of Statistical Models in Pandemic Preparedness: Principles and Reality


International Lecture Series 2022

Visit our events page for access to past recordings and to register for the next International Lecture

Videos from the lectures can be found on YouTube.

 

Explaining the science

You can learn more about the individual projects listed above by visiting the project overview page.

Jargon-free and written for a quick and easy guide to the work that is taking place at the Turing-RSS Health Data Lab.

Contact info

To learn more or to get involved with our project work, contact the team at [email protected]

Follow us on Twitter @turingrss_hdlab

Researchers and collaborators

Professor Peter Diggle

Distinguished Professor, CHICAS, Lancaster University and Steering Group Mentor, RSS COVID-19 Taskforce

Ali Marsh

Senior Programme Manager, Health and Medical Sciences (Maternity Leave)

Dr Brieuc Lehmann

Assistant Professor (University College London) and member of the Turing-RSS Lab

Past and current projects

Nowcasting positive test counts with reporting lag

Principal Investigators 

Dr Radka Jersakova, The Alan Turing Institute 
Dr James Lomax, National Cyber Security Centre and The Alan Turing Institute 

Research team 

Mark Briers, The Alan Turing Institute 

James Hetherington, University College London 

Chris Holmes, The Alan Turing Institute 

Brieuc Lehmann, University College London 

George Nicholson, University of Oxford 

Overview 

To monitor the current state of COVID-19, the UK government tracks the number of positive tests in each local authority. Since it takes time to process PCR swab tests, there is a delay of up to five days before all positive test results are reported. 

The goal of this project is to 'nowcast' the number of daily positive test counts up to the present date. A 'nowcast' is a prediction informed by analysis of data currently available. Using statistical models, we can infer the expected final count using the incomplete data as it arrives. This estimate can be used to make up for the lag in data reporting and aid decision making. 

Timeframe 

Project completed 

Outputs 

Technical report pre-print now available – Bayesian imputation of COVID-19 positive test counts for nowcasting under reporting lag. 
 
View the code on GitHub

Spatial and temporal modelling of incidence and prevalence of COVID-19

Principal Investigator

Professor Marta Blangiardo, Imperial College London

Research team 

Tullia Padellini, Imperial College London 
Radka Jersakova, The Alan Turing Institute 
Peter Diggle, Lancaster University 
Chris Holmes, The Alan Turing Institute 
Brieuc Lehmann, University College London 
Ruairidh King, MRC Harwell   
Ann-Marie Mallon, The Alan Turing Institute 
George Nicholson, University of Oxford 
Sylvia Richardson, University of Cambridge 
Luis Santos, MRC Harwell  

Overview 

Our goal is to estimate the prevalence of COVID-19, combining several sources of data, accounting for their biases and uncertainty.  

We aim at predicting the burden of the disease by integrating two different types of information on the number of cases: direct estimates (such as randomized surveys and testing programs) and indirect estimates (such as hospital admissions). We provide a flexible modelling framework, which is adjusted for known risk factors and accounts for spatial as well as temporal dependencies in our data.  

Timeframe 

Project completed 

Outputs 

Research paper:

Padellini, T., Jersakova, R., Diggle, P.J., Holmes, C., King, R.E., Lehmann, B.C., Mallon, A.M., Nicholson, G., Richardson, S. and Blangiardo, M., 2022. Time varying association between deprivation, ethnicity and SARS-CoV-2 infections in England: A population-based ecological study. The Lancet Regional Health-Europe, p.100322. 

Blog 

South Asians in poorer areas more at risk of catching COVID-19 

Estimating COVID-19 prevalence and transmission from multiple sources: de-biasing Pillar 2 data

Principal Investigators 

George Nicholson, University of Oxford  
Brieuc Lehmann, University of Oxford 

Research team 

Tullia Padellini, Imperial College London 
Koen Pouwels, University of Oxford
Radka Jersakova, The Alan Turing Institute 
James Lomax, The Alan Turing Institute 
Ruairidh King, MRC Harwell 
Ann-Marie Mallon, The Alan Turing Institute 
Luis Santos, MRC Harwell  
Izzy Russell, MRC Harwell 
Peter Diggle, Lancaster University 
Sylvia Richardson, University of Cambridge 
Marta Blangiardo, Imperial College London 
Chris Holmes, The Alan Turing Institute 

Overview 

Our goal is to estimate COVID-19 prevalence and transmission rates at a fine-scale level, such as local authority, by harnessing data from multiple testing sources. We are designing a statistical model to adaptively adjust for biases and coherently combine information across multiple data streams. 

Background 

The daily or weekly number of positive COVID-19 tests in a region is widely used as a proxy for the local number of infected individuals.  

Multiple testing sources 

Positive test numbers arise from:  

Randomized surveillance (REACT study, ONS survey). 

Pillar 2 testing focused on testing symptomatic individuals. 

Local mass testing at the level of cities, universities, care homes etc.   

 

Testing bias 

Tests results are subject to sampling and operational influences: 

Ascertainment bias: symptomatic individuals are prioritized for testing, so the rate of positive tests is greater than the actual disease prevalence in the population. 

False positive/negative test results: tests for COVID-19 infection, such as PCR and lateral flow, vary in sensitivity and specificity. 

Weekday effects: the numbers of tests performed depends strongly on the day of the week. 

Timeframe 

Initial project now completed, and model now being used to apply to other project datasets. 

Outputs 

Research Paper 

Nicholson, G., Lehmann, B., Padellini, T. et al. Improving local prevalence estimates of SARS-CoV-2 infections using a causal debiasing framework. Nat Microbiol 7, 97–107 (2022).  

Blog 

Why COVID-19 test data us skewed, and what we-re doing to fix it. 

Detecting COVID-19 using biomedical acoustic markers 

Principal Investigators 

Steven Gilmour, Kings College London 
Davide Pigoli, Kings College London 
Stephen Roberts, University of Oxford 
Bjoern Schuller, Imperial College London 

Research team

Kieran Baker, Kings College London 
Jobie Budd, University College London 
Harry Coppock, Imperial College London 
Chris Holmes, The Alan Turing Institute 
Ivan Kiskin, University of Surrey 
Vasiliki Koutra, Kings College London 
George Nicholson, University of Oxford  

Overview 

The Biomedical acoustic markers project aims to develop a process to identify features in audio signals (voice and speech sounds), which are caused by Covid-19. This has the potential to be a fast and easily used early test paving the way for mass testing. It also has the potential, in due course, to be used for the early detection of other diseases.

Our work builds on early-stage research from Cambridge and other research groups, which reported how an algorithm (A sequence of rules that a computer uses to complete a task) could accurately identify COVID-19 positive patients that had no symptoms using audio recordings of coughs from a small test group.  

To evaluate the possibility that COVID-19 results in unique features in individuals’ speech and airway sounds, we have collected a world leading respiratory sounds COVID-19 dataset. It is superior to previous datasets thanks to the number of recordings collected, richness of the metadata (information about the dataset) and quality of the ground truth labels (data that demonstrates COVID-19 status is the correct diagnosis).  

On top of this we have carefully created two subsets of the dataset to evaluate the performance of the model. The first set, known as the training set, is what we let the model see and learn from. The second set, known as the test set, is how we evaluate the performance of the model. We have carefully created these partitions to address the bias in the dataset. Most importantly the test set is curated to feature matched pairs of individuals. These paired individuals have all the same characteristics, e.g. symptoms except for their covid status. Therefore, if we classify a pair correctly, we are more confident that this is due to true COVID-19 audio signals, rather than other symptoms.  

If our study proves positive, the use of this algorithm released as a smartphone app has the potential as a rapid and affordable screening tool for COVID-19 and possibly other diseases.    

Timeframe 

From December 2020. 

 

Using wastewater data to monitor COVID-19

Principal Investigator 

Marta Blangiardo, Imperial College London 
 

Research team

Peter Diggle, Lancaster University 
Helen Duncan, The Alan Turing Institute 
Philip Li, Northumbria University 
Callum Mole, The Alan Turing Institute 
George Nicholson, University of Oxford  
Camila Rangel Smith, The Alan Turing Institute 
Sylvia Richardson, University of Cambridge
Barry Rowlingson, Lancaster University 
Fatemeh Torabi, University of Swansea

Overview 

Infected people with COVID-19, with or without symptoms, shed the virus through their digestive systems or during daily activities, which ends up in wastewater. This process is called shedding. It is now known that as the number of COVID-19 patients in one area increases the amount of virus particles detected in wastewater (the viral load) also increases. The process of testing wastewater is done at wastewater plants as part of the regular testing of wastewater samples.   

The Environmental Monitoring for Health Protection (EMHP) wastewater monitoring program led by the UK Health Security Agency, tests wastewater on a daily basis. This started in mid-2020 and carries on gathering data across 270 sites across England.  

This project seeks to use these data to address research questions such as:  

  • How determining the frequency of disease using wastewater data at specific points in time can be used with more commonly used health monitoring data?   
  • Does wastewater data add value to monitoring diseases?  
  • And how can we best design wastewater sampling schemes for real-time monitoring, either using only wastewater data, or combined with traditional monitoring data in a cost-effective manner?  

During the first phase of the project, the team will focus on the first of these research questions above and also work to identify priorities for future research. The data will first be explored and then the team will go on to conduct analysis related to different time periods and different places in the United Kingdom. 

Timeframe 

From November 2021 

Investigating transmission of COVID-19 using mobility data 

Principal Investigator 

Yee Whye The, University of Oxford 

Research Team

Helen Duncan, The Alan Turing Institute 
Tor Erlend Fjelde, University of Cambridge 
Hong Ge, University of Cambridge 
Michael Hutchinson, University of Oxford 
Radka Jersakova, The Alan Turing Institute 
Callum Mole, The Alan Turing Institute 
George Nicholson, University of Oxford  
Camila Rangel Smith, The Alan Turing Institute 
Sylvia Richardson, University of Cambridge

Overview 

This project aims to improve our understanding of how people’s movement affects the spread of COVID-19 virus. This work has the potential to provide insight for policy makers on, for example, the likely impact of travelling outside of a person's local area on controlling the spread of the virus.   

We will create a high-quality infectious disease transmission model (A model is a framework to show the relationship between variables in a dataset) that uses real time mobility data. This work builds on a space and time model (the Epimap model), previously developed by our team, which produces local estimates of transmission. The model includes consideration of other factors such as population density and data that captures social and economic deprivation, vaccination coverage and information on variants.  

The work from this project, such as the model, will be open source (accessible to all) to generate discussion and increase the transparency of this work for greater future reusability by other researchers and for greater access for policy makers. 

Timeframe 

From August 2021. 

 

Investigating COVID-19 transmission using gene sequencing (Genomics+)

Principal Investigator 
Ewan Birney, European Bioinformatics Institute 

Research Team 
Tom Fitzgerald, European Bioinformatics Institute 
Kumar Gaurav, European Bioinformatics Institute 
Chris Holmes, The Alan Turing Institute 

Timeframe 
From January 2022 

Goals

The Turing-RSS Health Data Lab focuses on co-developed research projects that both support the work of the the UK Health Security Agency (UKHSA) and the need for rigorous research which can add value to the existing body of knowledge by:

  • Providing independent, rigorous modelling and analysis to deliver new insights in the evolving fight against COVID-19.
  • Providing further understanding of COVID-19 to the public and wider scientific community.
  • Enhancing capacity within the UKHSA to better forecast and model the current and future epidemics.

Building a community

Community-building connects external expertise with the COVID-19 data science and analytics that takes place in the UKHSA. Activities will identify key and cutting-edge models and analytics, support academic input and discussion between different producer and user communities, and help visibility and understanding of the contribution of data science outside and inside government, addressing the evolving picture of COVID-19.

Working values

The focus of the Turing-RSS Health Data Lab is on supporting the government to undertake responsible and risk-aware design, development and deployment of statistical and mathematical modelling and machine learning.

The Turing-RSS Health Data Lab believes that critical assessment of its work by the research community and the public at large will help to improve the quality of any advice that it provides. Critical assessment through open science is an important tenet of this collaboration.

The Turing and RSS are able to draw upon a wide range of expertise through open call, to ensure recruitment of the most talented data scientists, with diverse backgrounds and experience.

All algorithms are to be designed and developed in a transparent and reproducible manner and delivered with sufficient detail that external research teams can replicate results if they have access to the dataset. This requires:

  • sharing the algorithms and methods publicly, so that anyone may examine them and share their insights and input with the Institute and RSS;
  • speaking openly in relation to our work, in line with the Turing’s status as an Independent Research Organisation;
  • adhering to ethical research principles, worthy of public trust, justifiable, fair and non-discriminatory.

Find out more

If you have any questions about this work, please contact [email protected].