Skip to main content

Search

  • Research Support Facility
    • About AIM and the RSF
    • Themes
    • Conference 2024
    • Open Invitation Seminars
    • AIM Outputs
    • People
    • Resources
    • FAQs
Menu

Main navigation

  • Home
  • Events
    • Events
    • Events from around the Turing Network
    • The Turing Lectures
    • AI UK 2025
    People sitting at a conference

    Section page

    Events from around the Turing Network

    Conferences, workshops, and other events from around the Turing Network

    Events from around the Turing Network
  • News
    • News
    • Podcasts
    • Data science and AI glossary
    • Learn, Explore and Participate
    • Blog
    • Publications & policy submissions
    A collage of impact story headers

    Impact

    The Turing works with a range of partners with far-reaching, real-world impact across sectors

    Impact
    Alan Turing

    The Turing Alphabet

    Introducing the Turing Alphabet: demonstrating the breadth of the Institute.

    The Turing Alphabet
  • About us
    • Our strategy
    • Impact
    • Frequently asked questions
    • Governance
    • Equality, diversity and inclusion
    • What Alan Turing means to us
    Annual Report Cover image

    Annual Report 2023-24

    Read about our highlights of the year

    Annual Report 2023-24
  • Research

    Grand challenges

    • Defence and national security
    • Environment and sustainability
    • Transformation of health
    • Divider
    • Fundamental research
    • Projects

    Programmes

    Artificial intelligence
    Data-centric engineering
    Data science for science and humanities
    Defence and security
    Finance and economics
    Health and medical sciences
    Public policy
    Research Engineering
    Tools, practices and systems
    Urban analytics
    Abstract curved lights

    Research projects

    Turing Research and Innovation Cluster in Digital Twins

    Turing Research and Innovation Cluster in Digital Twins
    View of the world with the sun on the horizon

    Research projects

    FastNet

    Developing artificial intelligence algorithms to fundamentally transform UK weather forecasting

    FastNet
  • Skills

    Skills

    • Find placement and networking opportunities
    • Learn and apply skills
    • Work with us on skills
    The Turing Online Learning Platform - Free courses, from The Alan Turing Institute

    Section page

    Courses

    Free and open learning resources on data science and AI topics.

    Courses
    Presentation for DSG

    A week in the life of a Data Study Group

    Watch the three part mini documentary series

    A week in the life of a Data Study Group
  • People
    • Spotlights
    • Researchers
    • Doctoral Students
    • Former Researchers
    • Business Team
    • Governance
    • Fellows
    Premdeep Gill

    Research spotlight

    Premdeep Gill

    Enrichment student Premdeep Gill is studying Antarctic seals and their sea ice habitats through satellite data, to better understand how they are coping with climate change

    Premdeep Gill
    Erin Young

    Research spotlight

    Erin Young

    As co-lead of the Turing’s Women in Data Science and AI project, Research Fellow Erin Young’s vital research maps the gendered career trajectories in data science and AI

    Erin Young
  • Opportunities
    • Jobs
    • Why work at the Turing?
    • Research and funding calls
    • Engage with the Turing as a researcher
    people in discussion

    Why work at the Turing?

    Find out some of the reasons why the Turing is a competitive employer

    Why work at the Turing?
    The Turing Online Learning Platform - Free courses, from The Alan Turing Institute

    Section page

    Courses

    Free and open learning resources on data science and AI topics.

    Courses
  • Partner with us
    • Industry
    • Government bodies
    • Turing University Network
    • Current partnerships and collaborations
    • Data Study Groups
    • International
    DSG participants round a table

    Data Study Groups

    Events bringing together some of the country’s top talent from data science, artificial intelligence, and wider fields, to analyse real-world data science challenges

    Data Study Groups
    abstract image with pattern

    Partnering with the Turing

    We work with a wide range of partners to help deliver our mission of changing the world using data science and artificial intelligence

    Partnering with the Turing
  • Contact us
    • Contact form
    • Press office
    • Our brand
    • How to get to the Turing
    • Join our mailing lists
    Yellow loudspeaker

    Join the Turing's mailing lists

    Get the latest on data science and AI at the Turing with Turing News or Turing Events, keep in the loop with CETaS, or discover Skills at the Turing.

    Join the Turing's mailing lists
    Turing researchers taking part in press interviews

    Press office

    Find out more about the expert commentary the Turing can provide

    Press office

Breadcrumb

  1. Home
  2. Research
  3. Research projects
  4. AI for Multiple Long-term Conditions
  5. Theme 2: Accessible, research ready data

CPRD Synthetic Data

Streamling the process for researchers using the real CPRD datasets

Learn more
Menu

Introduction

The cprd-data-wrangle repository is a resource created and maintained by the AI for Multiple Long Term Conditions Research Support Facility (AIM-RSF). It has been created for researchers working with the Clinical Practice Research Datalink (CPRD), facilitated by the RSF gaining access to the medium-fidelity synthetic versions of CPRD's datasets.

Researchers tasked with understanding the database tables, then querying and filtering to create a research cohort, may find our pre-processing pipeline and interactive notebooks a helpful guide to getting started. The overarching goal of this work is to streamline the process for researchers using CPRD datasets, with the creation of clear documentation, efficient data management strategies and analytical pipelines.

The GitHub Repository

You should be able to access the repository from our AIM RSF GitHub Organisation via this link.



We acknowledge and thank these groups for making this project possible:

  • National Institute for Health and Care Research (NIHR) for funding the AIM-RSF programme of work [NIHR202647] - see below.
  • The AI for Multiple Long Term Conditions Research Support Facility (AIM-RSF) programme for facilitating the delivery of this project.
    • This repository was created and is maintained by the AIM-RSF, led by Data Wranglers Rachael Stickland & Mahwish Mohammad.
  • Clinical Practice Research Datalink (CPRD) for access to synthetic versions of their datasets [synthetic data request no: SD000021].
  • The Alan Turing Institue. This project was supported in part through computational resources provided by The Alan Turing Institute under EPSRC grant EP/N510129/1.


The views expressed within any file in this repository are those of the author(s) within the AIM-RSF programme, and not necessarily those of the: NIHR, Department of Health and Social Care, Medicines and Healthcare products Regulatory Agency (MHRA) or CPRD.

 

Would you like to contribute?

We welcome contributions from anyone, however small or large. If you choose to contribute to this repository, please do this in line with our code of conduct. If you want to contribute but you're not sure where to start, see our general guide to contributing.
 
 

Citation

Almarzouq, B., Mallon, A.-M., Mohammad, M., Stickland, R., Whitaker, K., & AIM-RSF team. (2025). Introduction to CPRD using synthetic datasets (cprd-data-wrangle). Zenodo: https://doi.org/10.5281/zenodo.13693615


 

© The Alan Turing Institute 2025. All rights reserved.

The Alan Turing Institute, a charity incorporated and registered in England and Wales with company number 09512457 and charity number 1162533 whose registered office is at British Library, 96 Euston Road, London, England, NW1 2DB, United Kingdom.

Explore the Institute

  • Home
  • Events
    • Events
    • Events from around the Turing Network
    • The Turing Lectures
    • AI UK 2025
    People sitting at a conference

    Section page

    Events from around the Turing Network

    Conferences, workshops, and other events from around the Turing Network

    Events from around the Turing Network
  • News
    • News
    • Podcasts
    • Data science and AI glossary
    • Learn, Explore and Participate
    • Blog
    • Publications & policy submissions
    A collage of impact story headers

    Impact

    The Turing works with a range of partners with far-reaching, real-world impact across sectors

    Impact
    Alan Turing

    The Turing Alphabet

    Introducing the Turing Alphabet: demonstrating the breadth of the Institute.

    The Turing Alphabet
  • About us
    • Our strategy
    • Impact
    • Frequently asked questions
    • Governance
    • Equality, diversity and inclusion
    • What Alan Turing means to us
    Annual Report Cover image

    Annual Report 2023-24

    Read about our highlights of the year

    Annual Report 2023-24
  • Research

    Grand challenges

    • Defence and national security
    • Environment and sustainability
    • Transformation of health
    • Divider
    • Fundamental research
    • Projects

    Programmes

    Artificial intelligence
    Data-centric engineering
    Data science for science and humanities
    Defence and security
    Finance and economics
    Health and medical sciences
    Public policy
    Research Engineering
    Tools, practices and systems
    Urban analytics
    Abstract curved lights

    Research projects

    Turing Research and Innovation Cluster in Digital Twins

    Turing Research and Innovation Cluster in Digital Twins
    View of the world with the sun on the horizon

    Research projects

    FastNet

    Developing artificial intelligence algorithms to fundamentally transform UK weather forecasting

    FastNet
  • Skills

    Skills

    • Find placement and networking opportunities
    • Learn and apply skills
    • Work with us on skills
    The Turing Online Learning Platform - Free courses, from The Alan Turing Institute

    Section page

    Courses

    Free and open learning resources on data science and AI topics.

    Courses
    Presentation for DSG

    A week in the life of a Data Study Group

    Watch the three part mini documentary series

    A week in the life of a Data Study Group
  • People
    • Spotlights
    • Researchers
    • Doctoral Students
    • Former Researchers
    • Business Team
    • Governance
    • Fellows
    Premdeep Gill

    Research spotlight

    Premdeep Gill

    Enrichment student Premdeep Gill is studying Antarctic seals and their sea ice habitats through satellite data, to better understand how they are coping with climate change

    Premdeep Gill
    Erin Young

    Research spotlight

    Erin Young

    As co-lead of the Turing’s Women in Data Science and AI project, Research Fellow Erin Young’s vital research maps the gendered career trajectories in data science and AI

    Erin Young
  • Opportunities
    • Jobs
    • Why work at the Turing?
    • Research and funding calls
    • Engage with the Turing as a researcher
    people in discussion

    Why work at the Turing?

    Find out some of the reasons why the Turing is a competitive employer

    Why work at the Turing?
    The Turing Online Learning Platform - Free courses, from The Alan Turing Institute

    Section page

    Courses

    Free and open learning resources on data science and AI topics.

    Courses
  • Partner with us
    • Industry
    • Government bodies
    • Turing University Network
    • Current partnerships and collaborations
    • Data Study Groups
    • International
    DSG participants round a table

    Data Study Groups

    Events bringing together some of the country’s top talent from data science, artificial intelligence, and wider fields, to analyse real-world data science challenges

    Data Study Groups
    abstract image with pattern

    Partnering with the Turing

    We work with a wide range of partners to help deliver our mission of changing the world using data science and artificial intelligence

    Partnering with the Turing
  • Contact us
    • Contact form
    • Press office
    • Our brand
    • How to get to the Turing
    • Join our mailing lists
    Yellow loudspeaker

    Join the Turing's mailing lists

    Get the latest on data science and AI at the Turing with Turing News or Turing Events, keep in the loop with CETaS, or discover Skills at the Turing.

    Join the Turing's mailing lists
    Turing researchers taking part in press interviews

    Press office

    Find out more about the expert commentary the Turing can provide

    Press office
Sign in

Legal

  • Sitemap
  • Privacy notice
  • Cookie notice
  • Terms and conditions
  • Terms of acceptable use
  • eduroam at the Turing
  • Intranet

Awards

CSS Design Awards - Special Design Kudos CSS Design Awards - Best Inovation
CSS Design Awards - Best UI CSS Design Awards - Best UX
DOT COM Awards - Platinum

Socials

LinkedIn Twitter Instagram Facebook Email