Introduction

Cystic fibrosis (CF) is a genetic condition that affects more than 10,000 people in the UK and causes a range of challenging symptoms, particularly lung dysfunction that eventually leads to progressive respiratory failure. Only half of the people in the UK living with CF are expected to survive beyond the age of 40.

When the respiratory failure of individuals with CF becomes severe, a decision must be taken on whether to refer them for a lung transplant – a major operation that may increase their life expectancy, but which comes with significant risks of its own. The timing of this referral is crucial. Wait too long, and the person may become too unwell to tolerate and recover from a transplant. But referring too early means risking a transplant procedure before it is truly required and, because donor lungs are in short supply, potentially denies a transplant to someone in greater need.

The decision to refer is typically based on a single measure called 'forced expiratory volume' (FEV): when the volume of air a person with CF is able to exhale falls below 30% of healthy levels, they are generally deemed in need of a transplant. Yet research has revealed that just under half of the people referred in this way actually have a sufficiently urgent need for a transplant.

35%

Improvement in accuracy of machine learning method over traditional methods for clinical referral policy

Turing Fellow Mihaela van der Schaar, in collaboration with the Cystic Fibrosis Trust, and making use of a data set representing 99% of CF patients living in the UK, has been applying machine learning methods to improve this referral process. A tool called AutoPrognosis has been developed which suggests correct referrals two-thirds of the time – a remarkable 35% improvement in accuracy over traditional methods. The work, published in Nature Research’s Scientific Reports in July 2018, has the potential to dramatically improve the CF treatment process, and the quality of life of people with the condition.

Header image courtesy of Cystic Fibrosis Trust.

How did it start?

Van der Schaar and her PhD student, Ahmed M. Alaa, from the University of California, Los Angeles, wanted to know if a systematic, machine-learning-based analysis of CF-patient data could be more effective than current practice. If clinicians could more accurately predict how well a patient might fair without a transplant – for example, how their lung function might change and what their chances of dying are over a given time period – it could assist them in deciding, with their patient, if it is right to begin talking about a transplant.

The UK’s Cystic Fibrosis Registry, sponsored and managed by the Cystic Fibrosis Trust, supplied the researchers with three years’ worth of anonymised data that represented 99% of all the people in the UK with CF at the time. This is data freely volunteered to the database by people with CF. Since the vast majority of CF-related lung transplants are in adults, the researchers excluded children from their sample, and were left with more than 4,000 adults in the database who had not had a transplant at the start of those three years, and for whom there was follow-up data at the end of the period. These data include information on 115 variables, including not only FEV measures over time, but also details of the patients’ demographics, genetics, BMI, infections, previous hospitalisations and much more.

“Multiple factors play a role [in cystic fibrosis], so we need new methods to identify what variables are important”

Mihaela van der Schaar, Turing Fellow

Truly unlocking the powerful predictive potential of this data requires machine learning. Van der Schaar explained why in a podcast interview with The Economist: “It is complicated because cystic fibrosis is a heterogeneous disease, where multiple factors play a role: the different characteristics of the patients are interacting with each other in a complicated way, so we need new methods to identify what variables are important and how they interact with each other for a specific patient, to be able to predict the trajectory of the disease for that specific patient.”

What happened?

The goal was to develop an automatic process that used machine learning to produce accurate clinical prognostic models. To achieve this, the researchers developed an algorithmic framework and provided it with data for the 115 variables associated with each CF patient over three years. They then allowed the framework, called AutoPrognosis, to discover for itself which variables were most important and, crucially, how they interacted with each other to produce a variety of patient outcomes.

While AutoPrognosis found that FEV is indeed the single most important clinical variable, it also provided crucial insights into the importance of other variables in making accurate predictions. For example, oxygenation: variables that reflected disorders in gas exchange in the lungs played a key role in improving the precision, and therefore the usefulness, of the prognostic models.

CF patient outcomes
This figure illustrates the predictions made by AutoPrognosis. The colours indicate the CF patients’ actual outcomes. It shows that the majority of patients that had no adverse outcome were correctly predicted by AutoPrognosis as being of low to moderate risk. Credit: Ahmed M. Alaa & Mihaela van der Schaar.

Using the data from the CF Registry, they used AutoPrognosis to automatically produce a prediction for 3-year mortality. This timeline is appropriate, as it is a realistic waiting time for a lung transplant. The researchers conducted an extensive analysis of how AutoPrognosis performed, and compared it with those achieved through the existing medical guidelines, competing clinical models and other machine learning algorithms. They conclude in their research paper that AutoPrognosis “displays clear superiority to all competing methods in terms of both diagnostic accuracy and impact on clinical decision-making”.

So what would the 35% improvement in the accuracy of predictions mean if the system were put into clinical practice? The researchers put it this way: “In a [lung transplant] waiting list of 100 patients, our model would replace 17 patients who were unnecessarily referred for a transplant with 17 other patients who truly needed one.”

“This research opens up the possibility of a more data-driven approach to guide difficult decisions”

Dr Thomas Daniels, Consultant Respiratory Physician, University Hospital Southampton

But this work is not only about helping clinicians decide on if the time is right for a transplant referral. “Some of the biggest challenges so far have been understanding the UK CF Registry data and CF disease more generally and working with clinicians to identify meaningful clinical questions to answer,” says van der Schaar. The outputs of AutoPrognosis can be used to assess a variety of risks to the patient, and can be used to quantify the likely severity of future outcomes, which means it could be used to inform treatment planning, follow-up scheduling, and predicting a future date at which a transplant referral might be optimal.

“This new research opens up the possibility of a more scientific and data-driven approach to guide the hand of CF teams making these difficult decisions,” Dr Thomas Daniels, a Consultant Respiratory Physician at University Hospital Southampton, told the Cystic Fibrosis Trust. “The next challenge is how to integrate new tools such as this into every day clinical practice.”

CF patient and care professional
The predictive tool could assist clinicians in making more confident decisions, with their patient, about whether it is right to begin talking about a transplant. Image courtesy of Cystic Fibrosis Trust.

What does the future hold?

“While machine learning has proven successful in making predictions in a clinical setting, its deployment in practice has been limited,” says van der Schaar. “The outcomes of our research with the Cystic Fibrosis Trust demonstrate that with the right in-depth expertise, anonymised data from a large population, and input from clinicians, we can create algorithmic methods to support clinicians in their day-to-day decision-making.”

Professor Andres Floto of the University of Cambridge and a CF Physician at the Royal Papworth Hospital, has said of the work: “It elegantly demonstrates that machine learning is now ready for the clinic, will have an immediate impact on how we think about who to refer for transplantation, and could have tremendous benefits for individuals with CF.”

“This work elegantly demonstrates that machine learning is now ready for the clinic”

Professor Andres Floto, University of Cambridge and Cystic Fibrosis Physician, Royal Papworth Hospital

Before AutoPrognosis will be ready for clinical use, however, a thorough validation of the method in various clinical settings needs to be performed. Currently, Van der Schaar and her students are working closely with Daniels and Floto to build a decision-support system which incorporates AutoPrognosis to enable such validation. (In addition, Van der Schaar is now using the rich data of the CF Registry to explore the longitudinal trajectories of the disease of CF patients, to develop a deeper understanding of how different, competing risks change over time.)

A particularly exciting aspect of AutoPrognosis is that it is designed to produce risk prediction for a variety of diseases. Van der Schaar’s team has already applied it to conditions including cardiovascular disease and breast cancer, for example. Clearly, this is just the beginning of the enormously beneficial impact that this technology will have on medicine in the UK and beyond. It is a shining example of the sort of innovation the Turing’s Health and Medical Sciences Programme is committed to championing.

PDF Summary

Augmenting cover

Collaborators