Much of today’s data is stored and processed in the cloud. Cloud computing entails many, often vast data centres in heavily protected environments the size of warehouses. The data scientist would like to rent these scalable computing resources. However, the cloud may not always meet stringent technical, legal and regulatory requirements, especially for data that is personal and subject to privacy laws, or that is commercial-in-confidence.
Data ‘safe havens’ provide strict isolation between different tenants or customers of a data centre but this may not be sufficient for some uses – patient records, criminal record data, financial service audit data – where there is still a residual risk from insider attacks or from software (or hardware) vulnerabilities.
The next step is to take advantage of new hardware technology for trusted computing from Intel and from ARM, who have built hardware ‘enclaves’ to protect sensitive data.
This project investigates and prototypes the secure software components that are needed to take advantage of the new hardware.
We aim to create a new, secure big data platform that provides explicit guarantees about the confidentiality and integrity of compute jobs in untrusted cloud environments, but remains compatible with existing programme models so that data scientists can continue to use the familiar set of tools (for example Hadoop, Spark, Flink) that are so effective.
The resulting research will answer an industry need to enable data sharing without compromising data privacy, with a potentially transformative effect for sectors such as health, financial services, and crime.