Role of synthetic data in financial systems

Investigating the role and use of synthetic data in financial systems

Project status



Like all financial institutions, HSBC uses a large amount of data to support its primary goals of providing trusted services to customers. Data that is personal can play a significant role in protecting the bank, and its customers, from financial loss including bad lending decisions and/or fraud. Therefore, this project investigates the role of synthetic data to undertake various analysis and modelling projects without having to directly rely on personal data. For example, better analysing customers propensity to acquire further products (mortgages, loans etc) or fraud. This project is also interested to understand the striking and appropriate balance between privacy and utility and to support the understanding of of synthetic data as a service, by investigating commercially available synthetic data techniques and tools.

Explaining the science

There is a natural tension, in modern data science between wanting to apply or develop novel AI techniques and wanting to preserve the privacy of individuals or other sensitive information. One way to address this, is to use algorithms to construct synthetic datasets which will be close enough to the real data to perform the necessary development and testing, but far enough away not to impact the privacy of individuals or reveal sensitive information.

Project aims

The project aims to understand the appropriate balance between privacy, fidelity and utility of synthetic data, and to develop methods around said generation in the context of finance, such as analysing banking customers’ propensity to acquire further products such as mortgages and loans.​​


As above this work is being applied with our HSBC partners, but the ideas have broad applicability, and we are working with other Turing teams working on similar areas.


Researchers and collaborators

Contact info

Monica Vakil-Dewar
[email protected]