Introduction
Cynthia Dwork is renowned for placing privacy-preserving data analysis on a mathematically rigorous foundation. A cornerstone of this work is differential privacy, a strong privacy guarantee, frequently permitting highly accurate data analysis.
About the event
In machine learning, algorithms produce scores, which are often interpreted as probabilities: "What is the probability that this student will graduate within four years?"; "What is the chance that this loan will be repaid?”; "What is the probability that this tumour will metastasise under the given course of treatment?". Speaking intuitively, the goal in machine learning and AI is to produce these probabilities from training data, i.e. from evidence in the form of labelled examples: descriptions of individuals (inputs to the prediction algorithm), together with their outcomes (loan repaid or not).
The problem here is that we cannot even say what a ‘probability’ is! A 50% probability of heads for a coin means that if you flip it 1,000 times you’re very likely to get about 500 heads. But when you’re talking about an individual, you can’t run her though college, rewind her, run her through college again, rewind her, and see the fraction of those runs in which she graduates in four years. And without knowing what an ‘individual probability’ is, how can we design an algorithm to produce it? How do we know when we have succeeded?
This talk describes Outcome Indistinguishability, an approach to forecasting based in complexity theory, developed with Michael Kim, Omer Reingold, Guy Rothblum and Gal Yona.