Introduction
Datasets are often stored in silos spread across organisations and are not easy to share with outside entities (e.g. academic community) or with different departments within
Synthetic data generators (SDGs) enable users to share and link data, to work with data in safe environments, to fix structural deficiencies in data, to increase the size of the data, and to validate machine
This project aims to produce state of art data generators for both structured and unstructured datasets, as well as metrics for evaluation the utility and privacy of synthetic
Explaining the science
This project will draw on recent methodological developments in network modelling and the application of the signature method for data description. Combining these
Project aims
Synthetic data has many possible use cases such as increasing the size of the data, fixing structural deficiencies, or enabling researchers to test machine learning algorithms
Alongside this, synthetic data has the potential to enable easier access to synthetic versions of sensitive datasets, democratising research and allowing greater sharing of data
Working with the Office for National Statistics (ONS), we seek to create state of the art synthetic data generators, alongside metrics for assessing the utility and privacy of synthetic data to bolster data sharing
Applications
The potential applications of SDGs are numerous and range from simple synthetic datasets for software development, to allowing researchers access to synthesized versions of
This project is concerned with building up a useful framework for generating synthetic data, as well as assessing its privacy and utility. Doing this in collaboration with the ONS