In recent years there has been a revolution in neural network technologies, but they still generally lack interpretability and explainability. This project aims to enhance neural networks by incorporating human-readable explanations into both the network training process and the output testing process. This work will add crucial transparency and accountability to neural networks, as well as ensuring that the networks produced can be better generalised to multiple purposes.

Explaining the science

Existing attempts at interpretability and explanations in neural network technologies mostly focus on providing explanations after the networks have been trained and fixed. However, at best this only identifies the potential incorrect learned behaviour, without providing a straightforward general solution for improvement. 

In this project, a novel and general solution is proposed for both explanations and fixing incorrect behaviour. More specifically, the approach consists of 'injecting' a layer of human-intelligible explanations of the desired outputs of a neural network during training, as well as requiring these explanations when testing the network. 

Therefore, the neural networks produced will be guided to counteract biases and learn the desired functionality. At the same time, any end-user is provided with human-readable insights into the inner workings of these networks.

Project aims

The aim is to produce models that can use natural language explanations to provide better performance, counteract statistical biases in datasets, and provide explanations for the decisions made by the models. 

Success will be defined both in terms of qualitative results (i.e. higher accuracies in the decisions made) and qualitative results (i.e. high-quality explanations for the decisions made). 

It is crucial to develop frameworks for explaining the processes, services, and decisions delivered by algorithmic systems, to improve transparency and accountability, as highlighted by the EU General Data Protection Regulation. Among others, this regulation enforces the right to explanation for users impacted by algorithmic decisions, e.g. in healthcare, law, or finance.


The work will focus on natural language modelling as the primary application for the models produced. However, it will be ensured that techniques produced are sufficiently generic such that they can be applicable to other areas as well, such as computer vision or policy learning.

Recent updates

March 2018: Project received seed funding from The Alan Turing Institute.


Researchers and collaborators