Introduction
The era of ‘personalised medicine’, ushered in by advances in the technology of genome sequencing, has another transformation on the horizon. The combination of the world’s largest, high-quality cancer data collection service, world-leading expertise in healthcare-focused machine learning and high-performance computing is poised to produce nothing less than a quantum leap in personalised medicine.
The teams of Turing Fellow Mihaela van der Schaar, the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine (ML-AIM) at the University of Cambridge, and Jem Rashbass, the National Director for Disease Registration of Public Health England, have joined forces in research supported by the UK Research and Innovation’s strategic investment fund. They are tackling the personalisation of cancer care.
“I think we’re seeing the future of medicine.”
Sir Alan Wilson, Director of Special Projects at the Turing
When deciding on a treatment for a given form of cancer, clinical decisions are often made on the basis of results from randomised controlled trials of treatments involving that cancer. Essentially, that’s population-level medicine – an average – applied to the individual. That’s a useful but blunt tool, because no one is truly average: everyone has a mix of health history, other conditions (co-morbidities), genetic make-up, lifestyle and other variables that makes them unique.
Now, imagine a machine learning system with an understanding not only of every detail of that person’s entire clinical history and the trajectory of their disease, but also those of large numbers of similar clinical cases drawn from a nationwide database that is constantly updated in near real time. With the clinician’s push of a button, such a system would be able to provide patient-specific predictions of expected outcomes if no treatment is provided, and also the risks for a variety of treatment choices, all to support the clinician and patient in making what may be life-or-death decisions.
This system now exists, in a demonstrator mode fuelled by anonymised real-world data from England’s National Disease Registration Service (NDRS) and powered by the latest in machine learning. “I think we’re seeing the future of medicine,” says Sir Alan Wilson, Director of Special Projects at the Turing.
See accompanying slides here.
How did it start?
Good data science is impossible without high-quality data. This is where pioneering work by Jem Rashbass and his team of hundreds at Public Health England comes in.
Their work focused on cancer because even though “cancer” is actually an umbrella term representing a wide range of different diseases, personalisation is probably better understood in cancer than in any other discipline, with respect to heritable risk, screening, treatment options and outcomes. “How do I get record-level, identifiable data on every patient with cancer across the whole care pathway, and then curate that to sufficient quality so that I can use it for […] machine learning? That was the challenge. That has taken us 15 years,” says Rashbass.
"[This] challenge has taken us 15 years."
Jem Rashbass, National Director for Disease Registration, Public Health England
What’s more, this is no static database. The NDRS now has cancer patient data pouring in 24/7 from thousands of multidisciplinary clinical team meetings, 600 local secondary care systems, 142 chemotherapy centres, 82 breast screening centres, 56 radio therapy centres, 22 molecular testing labs, and 26 other types of data source. It collects entire, and ongoing, patient histories: age, genomic data, co-morbidities, lifestyle, previous treatments and outcomes. Anonymised data from this enormous range of sources are integrated into a dynamic, ever-evolving trove of patient data.
And this is where machine learning comes in, courtesy of pioneering work by van der Schaar and her students in her research lab, ML-AIM, in collaboration with Rashbass. “I met Jem in October 2018, and we immediately clicked because he wanted to do something at scale – to make an enormous impact,” says van der Schaar. “Jem and his team want to build an entire system of health care.”
What happened?
There are serious challenges in creating algorithms of sufficient complexity and nuance to deal with disparate clinical data drawn from myriad sources.
In other areas of medicine, some research groups are using ‘Markov state-space models’ (HMMs) to model cancer progression. These statistical, probabilistic models are relatively easy to compute and interpret, but the problem with them, says van der Schaar, is that they “compact the patient’s entire clinical history into a single current state, thereby ‘forgetting’ what has happened to this particular patient.” This is a problem because in medicine, patient history – such as the order and duration of a variety of health events – really matters.
Meanwhile, powerful deep learning tools such as recurrent neural networks (RNN) are able to deal with longitudinal data – i.e. the patient’s history. But they, too, are not powerful enough, by themselves, to deal with the sheer complexity of medicine. So van der Schaar first built a ‘Disease Atlas’ in which she applied a mechanism called ‘attention’, which enables the RNNs to discern which events in the patient’s history are the best predictors of how their disease will progress. (This paper was published in the Machine Learning for Healthcare Conference 2018 and received the best paper award at that year’s IJCAI-BOOM Workshop.) While the predictions made by this sort of RNN are pragmatic and dynamic, these neural networks are essentially black boxes, and so cannot explain, for example, why a particular event has happened, or what the effect of a treatment might be. In short, RNNs do not allow the incorporation or extraction of clinical knowledge.
The key to solving these problems was to keep the useful probabilistic structure and interpretability of HMMs, but use RNNs to model the dynamically evolving state of the patient – the best of both worlds. To this end, in 2018 van der Schaar’s team unveiled a machine learning system called the ‘Attentive State Space’ model. (This method was recently accepted in the top machine learning conference – NeurIPS 2019.) Unlike Markovian state-space models, in which the dynamics are ‘memoryless’, this model uses an ‘attention’ mechanism to create ‘memoryful’ dynamics, whereby relative weighting of ‘attention’ determine the dependence of future disease states on past medical history. “It combines ideas from statistics with ideas from deep learning. It is this wonderful synergy that gives rise to a variety of good solutions,” she says. “What drives the model is both ‘static’ information acquired about the patient, such as gender or genetic information, as well as the evolution of the patient over time.”
"[This work] combines ideas from statistics with ideas from deep learning. It is this wonderful synergy that gives rise to a variety of good solutions.”
Mihaela van der Schaar, Turing Fellow
In the first half of 2019, the collaborators built a demonstrator system, using real, but anonymous data from the NDRS on England’s breast cancer patients – all of those currently alive in the country and many who have died – and a variety of machine learning algorithms built by van der Schaar’s team.
The demonstrator takes information from the pathology report of an individual diagnosed with breast cancer and uses it to make an initial estimate of mortality risk over time, using a technology created by van der Schaar’s team, called AutoPrognosis. Another piece of the suite of technologies (a method called INVASE) can tell which variables are important for this particular patient at this moment in time. But that’s just the beginning. As the patient continues on their healthcare pathway, with further tests, treatments and other health events such as the appearance of another tumour, their data become richer, and the dynamic predictions more accurate. This enables the system to make individualised predictions about the patient’s healthcare pathway, including competing risks (e.g. mortality due to primary cancer, secondary cancer, or risks unrelated to cancer such as cardiovascular risk). Moreover, the ML-AIM system can go beyond predictions of risk: it can also recommend which treatments would be most effective for the specific patient, instead of relying on one-size-fits-all recommendations (see Fig 1).
Needles in haystacks
“In a clinic, what the doctor wants to know is, how similar are you to other patients?” says Rashbass. After all, that’s how most doctors work: they treat patients based on their knowledge of similar patients. But similarity is hard to determine, says Rashbass. “You can do it very simply, based on the age and sex of the patient, the tumour type and cancer stage. But that isn't good enough.” In the demonstrator, a clinician can say: ‘based on the hundreds of patient variables we've got in the model, find me the patients who are most similar to the patient sat in front of me’. “That is the start point for everything else in medicine,” says Rashbass. “The beauty of the technology we have brought to this, is that it allows us to do deep predictive clustering of individual patients.” (see Fig. 2)
This clustering will allow clinicians to explore a whole range of questions beyond the prognosis for a given patient, by allowing them to see the various probabilities that a patient will follow particular healthcare pathways, based upon the most similar patients in the country, how their cancer progressed and the outcomes of their treatment choices. This source of decision-support for clinicians and patients will be transformative and deliver truly personalised medicine.
What does the future hold?
The current state of medicine, in which population-level science is being applied to individuals, is no longer sufficient, Rashbass believes. “We're working in the dark and making guesses from true randomised controlled trials in how we treat people,” he says. “Population-to-individual does not work well enough. And anyone who has practised medicine knows that a lot of patients don't fit the trial data and don’t actually get better when you intervene with them. What is going to happen in medicine, is that we will see a fundamental shift, to doing things algorithmically.
“We’ve got extraordinary algorithms, we’ve got extraordinary data systems, we’ve got a service that is linking data together. What we want to be able to do is real-time, personalised decision-support, direct to individual patients and clinicians,” says Rashbass. “The idea is to deliver something at scale, right across the NHS.”
There’s lots still to do. The systems being developed in this collaboration will need validation with a wide variety of clinicians, and potentially a full clinical trial, before it could be rolled out. In the meantime, the collaborators are looking at the effects of treatments over time, and improving the system for a variety of cancers. “We are improving data extraction from pathology reports, using natural language processing to harvest more of the information. We are improving the clustering. We are improving the current technology but also adding new technology, and iterating this with a variety of clinicians in Jem’s team,” says van der Schaar.
"What is going to happen in medicine, is that we will see a fundamental shift, to doing things algorithmically."
Jem Rashbass
Beyond cancer treatment
The aim is that these advances will ultimately be felt beyond cancer treatment. For example, AutoPrognosis has already been deployed on the comprehensive patient data held by the Cystic Fibrosis Trust, and produced a 35% improvement in accuracy in predicting the optimal time for people with severe cystic Fibrosis to seek a lung transplant (see our Turing Impact Story: Augmenting clinical decision-making). Van der Schaar has also deployed AutoPrognosis on cardiovascular risk, using data from UK Biobank.
These pioneering systems could be applied to any number of health conditions, particularly chronic conditions such as diabetes and Alzheimer’s disease. (Recently, van der Schaar has started a collaboration in this area with Alzheimer's Research UK – the UK’s leading dementia research charity.) That’s because these conditions evolve over time, providing the AI system with ever richer patient data, allowing its patient-specific predictions and recommendations to become more accurate, while also generating clinical insights into disease trajectories and the effects of comorbidities. “With our methodology, the AI system gets better and better when it looks at more diseases and the relationships between them,” says van der Schaar.
Certainly, this is something the government is taking seriously. The UK’s Chief Medical Officer, Professor Dame Sally Davies, describes her 2018 Annual Report as “an aspirational view of what health could and should look like in 2040 if we commit to [health] being our nation’s primary asset”. In a chapter of the report authored by van der Schaar, entitled ‘Machine learning for individualised medicine’, she lays out her vision for a ‘Learning Engine for Healthcare’, which, by utilising the information in patients’ electronic healthcare records, “will produce a holistic view of risk, and the trajectory of risk, for many diseases to which the current patient might be at risk”.
One of the key challenges the Turing has set itself is to revolutionise healthcare – to improve the detection, diagnosis, and treatment of illness through the application of data science and AI. This work has the potential to do exactly that. “I'm a professor now, with hundreds of papers and many awards,” says van der Schaar. “But for me, now that I’m older, the real kick is to see the impact; to see that, potentially, something I do will really help some people's lives. With this system, I can see that happening.”
Note: Several of the quotes in this Impact Story were drawn from a Turing Lecture, hosted by Sir Alan Wilson and featuring Jem Rashbass and Mihaela van der Schaar, entitled “Transforming medicine through AI-enabled healthcare pathways”. The slides accompanying the lecture are available here.