Trustworthy synthetic data in practice

Building a toolbox to generate and evaluate the trustworthiness, scalability, and usability of synthetic data, and co-ordinating synthetic data research

Project status



The project will build upon and further develop capabilities of earlier work on synthetic data and address the associated challenges. We will work directly with data controllers to understand the challenges in evaluating the privacy of client datasets and in communicating these privacy concerns with data owners. The tools, methods and metrics we develop in the project will contribute to an integrated framework that will be operationalised through the Toolbox for Adversarial Privacy Auditing of Synthetic Data (TAPAS), enabling data holders and others in the wider community to generate and evaluate the trustworthiness, scalability, and usability of synthetic data.

Explaining the science

The project builds on previous work with synthetic datasets, which are constructed by algorithms to resemble real data enough for research purposes but mitigate concerns around privacy and availability. 

The project aims to consider the use of synthetic data in practice. To this end, we consider the use of synthetic data in the strategically important areas of net zero, digital twins, health, defence and security, and the digital economy. Our work with partners has helped shape this proposal, and we take a challenge-led approach to science and innovation. 

Project aims

 Our goals are:

• Develop preprints on the development of a framework to assess the trustworthiness of synthetic data and the tools used to generate them.
• Integrate tools for time series, network, and relational data (for finance, health, and CPS for Digital Twins) into the SDG workflow.
• Prototype integration of federated learning techniques with synthetic data.
• Further develop the TAPAS Toolbox for SDG.
• Present a framework with associated tools to measure the trustworthiness of SDG.
• Conduct two workshops: to explore the challenges of trust in synthetic data, and one to measure the efficiency of our work.


In 2021 Gartner predicted that by 2024, 60% of the data used for the development of AI will be synthetically generated, however there remain outstanding questions regarding the efficiency of synthetic data, and a growing need for quality assurance. 
The project, through its workshops, and connection to the Turing Synthetic Data Interest Group, will build a community of researchers, practitioners, and end-users of synthetic data.  By convening this group, we will build trust and understanding around the use of synthetic data and deliver world-class research.  

The variety of our partners indicates the need for synthetic data in tackling diverse challenges including financial fraud, digital twins for manufacturing and transport, and health data analysis. 


Researchers and collaborators

Contact info

[email protected]