No two cancers are the same. Every cancer reacts differently to treatment, and a treatment that works for one patient might not work for someone else. The growing field of personalised medicine can help to address this with tailor-made therapies, reducing unnecessary treatments and ensuring that the patient gets precisely the care they need. 

But how can oncologists know which therapy to choose? Data science can help. A collaboration between The Alan Turing Institute and Swiss healthcare company Roche has been harnessing the power of big data to explore ways of predicting how lung cancer patients will respond to cancer immunotherapy treatments.

The work described in this report was carried out during a one-week Data Study Group at the Turing in April 2019, followed by a 12-week research project later the same year. It marks the beginning of a partnership which could ultimately benefit the lives of cancer patients around the world.

Flicking the switch

Cancer immunotherapy treatments (CITs) work by helping the body’s immune system to fight off the cancer. Certain types of immunotherapy, known as checkpoint inhibitors, disable cancer cell proteins that disrupt the immune system’s functioning.

“One of these proteins is called PD-L1,” says Ariel Lopez-Chavez, Global Head of Oncology Development Innovation at Roche. “Some tumours make high levels of PD-L1, and these act to switch off a type of white blood cell in the immune system called a T cell. Checkpoint inhibitor drugs can block PD-L1, preventing it from switching off the T cells, so that the T cells are able to recognise and attack the cancer cells again.”

These drugs are sometimes used to treat patients with non-small cell lung cancer (the most common type of lung cancer), either as a first-line treatment if the cancer is advanced, or later if chemotherapy or other drugs have failed. However, the effectiveness of the treatment varies between patients, making it difficult to predict who is most likely to benefit.

A data revolution

Data science offers a potential solution by allowing researchers to look for large-scale patterns in tens of thousands of longitudinal patient records. However, fully extracting all the value from this data and understanding links between the tumours’ genetics and the patients’ treatment journeys and outcomes requires developing and applying novel analysis techniques.

In a Data Study Group (DSG) at the Turing in April 2019, 16 researchers with expertise in machine learning and biostatistics, from institutes across the UK, analysed two real-world, observational datasets. They used a Flatiron Health US electronic health record (EHR)-derived, de-identified database that at the time included data on around 52,000 patients diagnosed with advanced non-small cell lung cancer (aNSCLC). The second dataset was from the (also US) Flatiron Health-Foundation Medicine Clinico-Genomic Database (CGDB), which links Foundation Medicine’s de-identified comprehensive genomic profiling data with Flatiron’s de-identified, EHR-derived clinical data; this included data from around 5,900 patients diagnosed with aNSCLC who also underwent comprehensive genomic profiling.

During the week-long event, the team investigated different approaches to analysing the data, with the aim of finding the best computational model for predicting the survival time of aNSCLC patients treated with CIT as a first-line treatment, and also the effectiveness of CIT compared to chemotherapy.

The researchers found that a machine learning technique known as random forests allowed them to most accurately predict the survival time of aNSCLC patients, using the patients’ lab tests results (e.g. levels of albumin, calcium, creatinine and haemoglobin) as input to the model. Regarding the different treatment options, they found that CIT plus chemotherapy increased the average patient survival time, compared to just chemotherapy alone.

“We managed to implement and compare an impressive array of methods, many of which had not previously been considered by Roche,” says Karla Diaz-Ordaz, a biostatistician at London School of Hygiene & Tropical Medicine, who was one of the DSG participants.

“We were impressed by the breadth of ideas that were generated by the DSG, and that so much could be achieved in such a small time,” says Chris Harbron, Expert Statistical Scientist at Roche, and one of the company’s representatives during the challenge. “We were keen to further develop some of these ideas, and so we started discussions with the Turing about what we could do to continue collaborating.”

Going further

The next step in the collaboration was a 12-week research project, which took place at the Turing during August-October 2019.

Through an open call, two PhD students and a postdoctoral researcher, with expertise across genomics, machine learning and statistics, were recruited by the Turing to work closely with Roche and Diaz-Ordaz, who was now acting as the project’s lead data scientist (also recruited through the open call). The Turing’s wider community of experts was also invited to pitch suggestions for the project.

Working with the same datasets as in the DSG, the team was tasked with improving the models for predicting patient survival, and with quantifying the added value of the genomics data, in terms of increasing the models’ predictive power.

Using a statistical technique called a ‘stacked Cox’ model, the team showed that adding the patients’ genomic data to the clinical data improved the predictions of patient survival. The researchers also confirmed that machine learning methods such as random forests could improve their predictions, as these techniques automatically adapt to complex relationships within the data.

The project also used machine learning methods to look for subgroups of patients who would most benefit from CIT. Using a method known as causal forests, the researchers found that patient levels of albumin and the PD-L1 protein were the most important drivers of the differences in patient response to CIT.

“The project brought ideas that we hadn’t considered before and might not have considered otherwise,” says Harbron. “We really appreciated how the Turing scientists, as well as bringing their technical expertise and ideas, also engaged in understanding our data and business problems to come up with relevant solutions.”

Looking forward

These projects have provided important insights into the development of the data science techniques that will one day help clinicians to choose the most effective cancer treatments for their patients.

“The potential application of these methods in a clinical setting can be very valuable in selecting patients that are most likely to benefit from a particular treatment, avoiding unnecessary exposure to treatments that won’t help a patient’s condition,” says Ariel Lopez-Chavez.

“Furthermore, identifying the underlying mechanisms responsible for the lack of response to treatment can help us to develop new strategies for patients with treatment resistance, improving their outcomes.”

This collaboration has provided input into Roche’s personalised healthcare research – one of its key business priorities – and it marks the beginning of a five-year strategic partnership between the Turing and Roche, which aims to continue developing these advanced analytics techniques and ultimately help to make personalised treatments an everyday reality for patients.

Further reading:

Data Study Group final report: Personalised lung cancer treatment modelling using electronic health records and genomics