The emergence of flexible 'theory-free' learning algorithms have revolutionised pattern recognition. This is useful for categorisation and detection tasks, such as natural language processing, but offers few insights into underlying causal processes. Distinguishing cause from correlation is however essential if our interests stretch beyond prediction and forecasting into understanding how variables influence each other, and how interventions can be used to meaningfully change a system. This group is interested in all aspects of this ‘third task of data science’, which is broadly termed ‘causal inference’.
Explaining the science
Most analytical data science can be divided into three tasks; description, prediction, and causal inference. Description focuses on describing patterns and trends, often visually, in order to generally understand the occurrence of a concept of interest, e.g. how many people have diabetes; or how the risk of violent crime varies between areas. Prediction focuses on identifying patterns and forecasting possibilities, e.g. whether someone might develop a disease, or what the UK temperature might be in 2050. Causal inference focuses on determining how one thing influences another, and specifically focuses on estimating how changing one thing might change another, e.g. how does weight management effect the future risk of diabetes, how would the risk of violent crime change if a minimum unit price of alcohol was introduced?
Causal inference questions address some of the most interesting and impactful issues, but they are also some of the most difficult. Unlike with description and prediction, the answers cannot be 'learnt' purely from the data, and instead require either strict conditions or expert knowledge. Revealing that knowledge, and in turn estimating causal effects, have become easier with the emergence of various specific ‘causal inference’ methods, such as causal diagrams, but the broader field remains in its infancy. Despite this, causal inference methods have already helped to explain and reveal a number of challenges, pitfalls, and apparent paradoxes in routine data science, particularly concerning non-experimental data of the kind that now dominates our world.
Peter Tennant, who co-leads the interest group, delivered a general introduction to ‘causal inference’ (and the need for a ‘causal inference approach’) at the University of Oxford Big Data Institute in March 2019. Watch it below:
We are primarily interested in bringing together data scientists with an interest in the quantitative science of ‘causal inference’ to help increase awareness, understanding, and use of appropriate methods to improve the relevance and utility of data scientific work. The key areas of scientific interest include:
- Translating causal inference methods into applied data science
- Advancing our fundamental understanding of data science using causal inference methods
- Developing and adapting new causal inference methods
- Increasing awareness, education, and training in causal inference methods
- Improving prediction modelling with causal inference methods
- Understanding, explaining, and developing algorithms to be fair and ethical using causal inference methods
Translating causal inference methods into applied data science
Challenges: Causal inference methods remain underused in applied data science and have struggled to move out of their theoretical origins. This theme is focused on translating causal inference methods into applied data science.
Advancing our fundamental understanding of data science using causal inference methods
Challenges: Causal inference methods can offer tremendous insights into the challenges, pitfalls, and apparent paradoxes that occur in routine data science. This theme is focused on exploring, revealing, and solving various challenges and confusions in applied data science, offering solutions where possible.
Developing and adapting new causal inference methods
Challenges: Causal inference is an evolving and emerging area. This theme is focused on developing and adapting new methods for causal inference.
Increasing awareness, education, and training in causal inference methods
Challenges: Awareness and understanding of causal inference remain very low in the UK data science workforce. This theme is focused on increasing awareness of, and training in, causal inference methods.
Improving prediction modelling with causal inference methods
Challenges: Building accurate and reliable prediction models that work as intended in many different settings is particularly challenging. This theme is focused on how causal inference can be used to improve the performance and reliability of prediction models, whilst facilitating prediction models to provide meaningful inferences.
Understanding, explaining, and developing algorithms to be fair and ethical using causal inference
Challenges: Machine learning algorithms can have many social and practical implications, not all of which may be ideal, with hidden biases challenging the fairness and ethics of some implementations. Understanding, explaining and modifying algorithms to be fairer and more ethical is therefore an area of significant interest. This theme is focused on how causal inference can be used to understand the features and consequences of an algorithm in order that it may be improved upon.
The group is currently organising new talks, meetings and a reading group. More information to follow shortly