Machine learning model performance changes as time passes due to drifts in the distributions and variances in fresh input data. We propose to automate the monitoring of both data and model performance. A long term support infrastructure like Learning Machines ensures that technologies built on yesterday's data continues to be safe in the future.
Explaining the science
The first step for LM is to automate the generation of quality control checks and descriptive statistics both for the dataset itself and on metrics relating to the performance of a predictive model. We will be using both single-value measures (e.g. mean value of dataset features, model accuracy) and also more sophisticated methods such as autoencoders to quantify how different features and the relationships between them changes over time. These measures will be used to determine when retraining the model is necessary.
The algorithm or analysis performed will initially be simplified ‘toy models’, or models that have been published previously for the the Surveillance, Epidemiology, and End Results (SEER) dataset with open source code. We will be looking at deploying well-established machine learning methods (e.g. random forest) to predict disease outcomes. The year of diagnosis in the SEER data will allow us to look at long-term trends and changes. We also plan to investigate approaches for generating synthetic data with drifts over time.
It is important to note that the choice of algorithm or analysis itself is not the focal point of the project, instead the focus is on building infrastructure required to detect features of datasets changing over time as new data is accumulated, and how this affects algorithm performance.
The purpose of Learning Machines (LM) is to support the provision of augmented intelligence in the healthcare and justice domain. Learning Machines is the infrastructure required to support the continuous use of machine learning techniques over time. As such, it is generalizable across use cases, and across machine learning approaches.
The project will investigate, design and build the software infrastructure necessary to support the continuous appraisal of data and an algorithm’s performance on that data, for the long term.
This project will prioritise approaches and components to enable good research engineering practices with minimal overheads to researchers themselves.
One aspect of LM addresses the fact that machine learning techniques in areas such as healthcare and criminal justice can be used to make high-stake decisions.
Therefore in addition to ensuring performance is maintained over time, LM will advocate different aspects of "Trustworthy Machine Learning". This includes issuing uncertainty bounds for the predictions, having an interpretability block to explain the predictions of the model to the domain expert (e.g. the clinician), and making sure that the fairness criterion of the choice is satisfied.
We presented our first demo to communicate our work and findings. In this demo we train five versions of a random forest model using the Surveillance, Epidemiology and End Results (SEER) dataset.
The model considered the task of predicting five years survival likelihood for patients with breast cancer, from 1998 to 2011. We have incorporated different components of LM in this model, including visualisations to provide instance-wise feature importance (interpretation), descriptive statistics (to see whether the distribution of the data changes over time), and model uncertainty bounds.
This demo shows both 1) models' performances changing over time, and 2) individual patient predictions' certainty values changing as models are subsequently updated, alongside 'explanations' why the predictions are now different.
Researchers and collaborators