Introduction
This project aims to develop methods and tools to assess privacy/utility trade-offs of synthetic data generation, and to build up a comprehensive picture of privacy and utility for existing generative models (GMs) that offer privacy guarantees.
Explaining the science
Publication and other sharing of data sets, even if 'anonymised', has proven to have unacceptable privacy risks in many application areas involving significant amounts of personal data. Synthetic data generators (SDGs) is a so far promising alternative, but the justification of its use needs to be substantiated by methods to systematically assess its quality, in terms of both privacy and utility.
This project aims to develop methods and tools to assess privacy/utility trade-offs of synthetic data generation, and to build up a comprehensive picture of privacy and utility for existing generative models (GMs) that offer privacy guarantees.
This provides confidence for users of such methods, deeper insight into trade-offs between privacy and utility, and will enable the development of new refined SDG methods and the improved configuration of existing SDGs for new application domains in industry and the public sector. In addition, the project will contribute to improving the reproducibility of research results by defining a standard methodology and standard metrics that can be used for the evaluation of new SDGs.
Project aims
This project will:
- Design of a modular framework for assessing SDGs
- Select and develop metrics suites to assess various aspects of the SDGs, with the ultimate aim of condensing these into a single metric
- Explore approaches for optimising the trade-off between privacy and utility
- Create a user-facing interface to allow users / organisations to obtain a privacy / risk assessment