The counterfactual fairness project concerns the impact of computer algorithms on society, in particular how to prevent automated processes from resulting in unfair discrimination. Unfairness may appear in data for reasons that include sampling biases, implicit or intentional human biases, spurious correlations, unequal opportunities, legal discrimination, or other forms of historical adversity. If the data input to a machine learning method is unfair, the output will generally be unfair as well.
A growing number of researchers are recognizing the importance of this problem and have proposed a variety of solutions. This usually involves formulating a mathematical definition for fairness and describing algorithms to achieve that definition. But it is not always clear if a given definition or algorithm properly addresses the underlying sources of unfairness. We advocate an approach to fairness that carefully models the causal and generative factors behind the data. Notions of fairness that do not account for underlying causal relationships may perpetuate or possibly even increase unfairness.
Determining causal relationships can be more challenging and require more assumptions than analyses that simply address correlation. Fortunately, methods from the field of causal inference can be adapted to this setting. And by modelling the causal structure explicitly, this approach is transparent in how it achieves fairness, which may be important for decision makers or regulators responsible for enforcing fairness.