Usually when personal data is used by a company it is collected in a manner that allows the company to see it, unobscured. By adding noise to the data before the company sees it the amount of information leaked about a person can be provably bounded, however getting good bounds is hard. In this project, alternative security assumptions and cryptography are being used to reduce the information leakage for a given noise level.
Explaining the science
In this project the primary notion of privacy is that of 'differential privacy'. To understand this notion imagine you are a data provider with a choice to tell the truth to a data collector or to fabricate a (presumably more flattering) lie. A mechanism for data collection is said to be differentially private if your choice of whether or not to lie doesn't significantly change the probability of the collector seeing any particular outcome. The collector thus can't infer what the data you as an individual provided.
Whilst this guarantee intrinsically requires uncertainty to be added, with a sufficient number of providers it is possible to obtain accurate statistics for the population.
It seems intuitive that if you provide your data anonymously that will be helpful to keeping your data private. Whilst anonymity alone can lead to substantial loss of personal information, it turns out that it can significantly enhance the privacy guarantees provided so long as it is combined with at least a little randomization.
This project hopes to produce new algorithms for computing statistics of interest. This project will be a success if the algorithms produced compute useful statistics accurately whilst providing a stronger privacy guarantee and/or requiring substantially less input data than the previous state of the art for that statistic. It will also be strongly preferable that the algorithms are practical enough to be deployable in the real world.
Such algorithms will allow privacy-preserving practices based on cryptography to be deployed not just for products with millions of users but also for products with thousands of users. Thus they would be a step towards allowing privacy-preserving data collection/release techniques to become a standard in industry.
Privacy-preserving efficient data collection could be deployed in an array of different scenarios. Multiple hospitals could run statistical analyses on their combined datasets, without endangering patient privacy. Better guarantees could be provided for the safety of releasing census data.
The type of application most appropriate to the work of this project is that of a company with an application that they wish to update on the basis of how users interact with it. The ultimate goal of this line of research would be to allow most of these applications to be updated without anyone having their data being thoroughly exposed (i.e. whilst being provided differential privacy).