Introduction
Cloud computing services offer hugely scalable resources, but don't always guarantee confidentiality and integrity of compute jobs. By taking advantage of new hardware technology for trusted computing, this project aims to create a new, secure big data platform that allows for secure computing in the cloud.
Explaining the science
Much of today's data is stored and processed in the cloud. Cloud computing entails many, often vast data centres in heavily protected environments the size of warehouses. Data scientists would like to rent these scalable computing resources. However, the cloud may not always meet stringent technical, legal, and regulatory requirements, especially for data that is personal and subject to privacy laws, or that is commercial-in-confidence.
Data 'safe havens' provide strict isolation between different tenants or customers of a data centre but this may not be sufficient for some uses - patient records, criminal record data, financial service audit data - where there is still a residual risk from insider attacks or from software (or hardware) vulnerabilities. The next step is to take advantage of new hardware technology for trusted computing from Intel and from ARM, who have built hardware 'enclaves' to protect sensitive data.
Secure enclaves are trusted hardware that provide a secure container into which the cloud user can upload encrypted private data, securely decrypt it, and compute on it. Both the decryption and the computation are run in a processor which, in principle, can’t even be broken into by their owner.
Project aims
This project investigates and prototypes the secure software components that are needed to take advantage of new hardware like secure enclaves. The aim is to create a new, secure big data platform that provides explicit guarantees about the confidentiality and integrity of compute jobs in untrusted cloud environments.
It will also be key that such a platform still remains compatible with existing programme models so that data scientists can continue to use the familiar set of tools (for example Hadoop, Spark, Flink) that are so effective.
Applications
The resulting research will answer an industry need to enable data sharing without compromising data privacy, with a potentially transformative effect for sectors such as health, financial services, and crime.