Introduction
The Data for Research Programme is a newly onboarded team, focussing on delivering research-ready data in a strategic and systematic manner. The provisioning of data is often the single largest hurdle for most analytical projects and the aim of the team is to sit at the interface between data generators, domain experts and data analysts, understand the intended use for the data in the context of research questions and develop the data coordination and management frameworks that can implement an efficient and robust data delivery platform, across the Institute. We utilise tools, standards and processes for integrating and provisioning complex data in a framework for AI. The readiness of data for research can encompass many barriers and bottlenecks, and the team are keen to build partnerships and identify projects which are dealing with these challenges.
Key areas where the Data for Research Programme contributes:
- Data Readiness: data integration and added value - knowledge of the data and standards it adheres to
- Data Lifecycle: develop pipelines and processes for delivering research ready data; data releases for reproducibility; data provenance with end-to-end tracking from source to analyst
- Data Characteristics: Representativeness, Accuracy, Completeness, Accessibility, Coverage
- Data Integration - add value via unification and harmonization of datasets
- Data Quality and Standards - comply with domain-specific standards; ensure usability of data through metadata
- Data Security: methods for anonymisation, compliance with governance policies
- Supporting all project stakeholders and processes throughout the data lifecycle
The team is also a contributor to and advocate of The Turing Way, an open, community-led handbook to reproducible, ethical and collaborative data science.
The team is fondly known as the Data Wranglers, and you can read a chapter of The Turing Way book which describes this job role.
A Data Wrangler collaborates with multiple specialists to provide analysis/research-ready data whilst upholding data privacy and domain-specific standards. Created by Scriberia with The Turing Way community. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807
Projects
AIM RSF - AI for Multiple Long-Term Conditions Research Support Facility
A Research Support Facility (RSF) based at Turing, in conjunction with Swansea University and University of Edinburgh, is funded by the NIHR within the Artificial Intelligence for Multiple Long-Term Conditions (AIM) programme. The RSF is created to embed best practices in data security and standards, reproducibility, and public and patient engagement across all the research collaborations funded by the AIM program, situated across the UK. The Data for Research Programme has Data Wranglers working for this RSF within the theme of ‘accessible, research ready data’, with the aim to align datasets and standards across this large research collaboration, striving for improved data quality and data outputs that can benefit research within the AIM programme and beyond.
More information:
- www.nihr.ac.uk/explore-nihr/funding-programmes/ai-award.htm
- www.turing.ac.uk/research/research-projects/ai-multiple-long-term-conditions-research-support-facility?msclkid=5a7842ffb4c111ec8be32c9005833fda
NATS / Bluebird
NATS joins The Alan Turing Institute in a partnership supported through an investment from EPSCR in researching applying artificial intelligence to support air traffic control. The vision is to deliver the world’s first AI air traffic control system to control a section of airspace in live trials, working alongside air traffic controllers to help address the complexities of their role.
More infomation:
EDoN – Early Detection of Neurodegenerative Diseases
The EDoN Initiative aims to develop a digital tool that will be able to detect neurodegenerative diseases potentially decades earlier than is currently possible. EDoN will collect digital data from technology including wearables and apps, alongside clinical data, to develop digital fingerprints that allow for the early identification of disease. The Alan Turing Institute works as part of a global team bringing together expertise in data science, digital technologies and neurodegenerative disease.
More information:
- www.turing.ac.uk/blog/ai-and-alzheimers-how-turing-pioneering-new-research-devastating-disease
- https://edon-initiative.org
Oxford BDI – Novartis collaboration
Novartis and the University of Oxford’s Big Data Institute (BDI) established a research alliance with the aim to improve health care and drug development by making it more efficient and targeted. Using a combination of the latest statistical machine learning technology with an innovative IT platform (developed to manage large volumes of anonymised data from numerous data sources and types), we plan to identify novel patterns with clinical relevance which cannot be detected by humans alone to identify phenotypes and early predictors of patient disease activity and progression. The Data Wranglers have developed the data platform and manage the pipeline of data from Novartis to the analyst community. This was a project the team set up before joining the Turing and have onboarded with them to enable skills development and collaboration. A team of Data Wranglers are based in Oxford’s BDI.
More information: www.bdi.ox.ac.uk/news/bdi-novartis-partnership-is-announced
IMPC
The International Mouse Phenotyping Consortium (IMPC) is an international effort by 21 research institutions to identify the function of every protein-coding gene in the mouse genome. The NIH funded informatics consortium of EMBL-EBI, QMUL and MRC Harwell have developed the web portal and underpinning data coordination centre, which manages and unifies all the data from 10 international centres into a one point of access. Dedicated Data Wranglers work with each data generating centre to collate data and perform quality control. All of the data and the annotations are provided on an open web portal to the scientific community. This was a project the team set up before joining the Turing and have onboarded with them to enable skills development and collaboration.
More information: www.mousephenotype.org/about-impc/
DECOVID
DECOVID will store detailed and frequently updated health data from hospitals as the COVID-19 pandemic unfolds, to allow clinicians and researchers to generate rapid and robust insights that can lead to more effective clinical treatment strategies, helping patients, healthcare professionals and society.
More information: www.turing.ac.uk/research/research-projects/decovid
Turing-RSS Health Data Lab
The Turing-RSS Health Data Lab is a working partnership between The Alan Turing Institute and Royal Statistical Society (RSS) supporting the UK Health Security Agency (UKHSA).
Established in August 2020, The Turing-RSS Health Data Lab has focused on conducting world leading statistical research to meet the needs of current and future health surveillance systems.
The Data for Research Programme has been involved with the RSS Health Data Lab by providing data management and querying, mainly.
More information: www.turing.ac.uk/research/research-projects/turing-rss-health-data-lab