The current COVID-19 pandemic has greatly impacted breast cancer diagnosis and treatment, with almost one million women in the UK estimated to have missed vital breast cancer screenings this year. Written to reflect on Breast Cancer Awareness Month, which takes place every October, in this blog Turing Fellow Andrew Holding discusses how we can use machine learning to understand the biology that underlies this form of cancer to help improve treatments.


In 2001, the drug Gleevec found its way onto the cover of Time Magazine with the headline: “There is new ammunition in the war against cancer. These are the bullets.” A claim that would turn out to be totally justified. Gleevec now stands as one of the greatest successes in treatment development. Back in 1980, less than 10% of patients with chronic myeloid leukaemia would survive ten years, today over 90% will survive thanks to Gleevec. 

Unfortunately, these successes are not universal. A positive outcome for one patient does not guarantee the same treatment will be beneficial for another. Different cancers are vastly different diseases, and most types of cancer have further multiple subtypes. For example, breast cancer responds very differently to therapy if the cancer cells originate from the milk-producing lobules rather than the walls of the ducts that carry the milk. The challenge is further complicated by the different molecular subtypes, defined by which genes are, or are not, activated in the cancer cells within the tumour. Each of these features can make the cancer respond differently to any treatment. 

Given the complexity of the disease, it should be no surprise that the biggest successes so far in the fight against cancer have not been precisely targeted treatments like Gleevec. Instead, it has been awareness campaigns or interventions, such as informing people of the risks of smoking, that have had some of the biggest effects in preventing cancer deaths. 

Patient data to guide healthier lifestyles

Today, the NHS generates more data than ever before. This vast resource presents an opportunity to follow the leads of Professors Richard Doll and Austin Bradford Hill, who undertook the first large-scale study exploring the link between smoking and cancer, in finding lifesaving links between lifestyle and disease. However, the large scale of the data also presents a challenge. How does one even begin to read it all? 

Fortunately, through modern AI and machine learning methods, computers are superpowered to read large amounts of data and recognise patterns. Although this is not without challenges. Gaps exist in patient records because not all patients go to the GP for the same symptoms and we must develop ways to address these. However, by developing strategies for digitising and codifying patient data, we can then use machine learning to identify subtle trends that put us at risk of cancer. The same technology can also spot links between apparently unrelated symptoms to help a GP diagnose earlier. The potential of such technology is enormous, helping us identify new ways to reduce our cancer risk and to provide patients with specific personal advice. In the case of breast cancer occurrences, relapse rates and mortality vary widely for different for people from different countries and cultures. AI and machine learning presents an important tool to identify these risk factors and make it possible to improve patient outcomes for all.

 AI technologies can also support screening programmes and have been proposed as a solution to the backlog caused by COVID-19. By looking for patterns in the small details, either in the images of biopsy samples or in a patient’s records, we can focus on those who most need it. This strategy provides an opportunity that benefits everyone, by minimising the chance of false positive results for cancer that causes unnecessary distress to patients.

Learning biology with AI

AI methods are not just limited to patient records and diagnosis - this is where our research comes in. It moves away from healthcare and looks at the biology that underlies cancer. We hope to use machine learning to understand more about cancer cells and find new weaknesses that we may exploit with new therapies when current treatments don’t work. 

In 70% of breast cancer cases, it is the cell’s response to the sex hormone oestrogen that becomes miswired. Drugs targeting this process have been developed, blocking the oestrogen message and successfully halting the tumour’s growth. The treatment works well for many patients, but over time relapses occur, and the cancer cells start to grow again. 

To bypass this adaptation by the tumour, we hope to build a map of the wiring diagram that defines the oestrogen response. Then, using this information, target the connections carrying the signals that cause the cancer cells to grow again. However, studying the different parts of the cell can be slow and laborious. To resolve this challenge, we reduced the size of each experiment down to a single cell. We used CRISPR/Cas9 technology - Cas9 acts as a pair of molecular scissors within each cell, we can then guide these scissors to a set location to cut the DNA and inactivate a single gene in a single cell. The inactivation of different genes breaks different connections in the signalling network and we record how this changes the cell's response to oestrogen. Because we can do this on a single cell at a time, we can do thousands of experiments in a single petri dish. This kind of data is perfect for machine learning as on its own, and at such a small scale, the data from each cell is unclear. However, when we combine the thousands of results from each cell, machine learning can slowly build a picture of the patterns in the data.

The patterns in the data will reveal how various parts of the cell are wired together. We will then use this information to target the wiring that drives breast cancer growth and shut down the messages it carries. Blocking not just the one route, but also all the potential ways around our molecular roadblock. Our results won't necessarily require the development of new therapeutics either. Instead, it may be enough to combine those that already exist, using our new-found knowledge to predict new ways current therapies may function together to benefit patients.

That doesn’t mean we will stop looking for silver bullets like Gleevec, but we have started to accept that they are few and far between. Fortunately, AI and machine learning provide new opportunities to prevent, diagnose and even interpret the complex biology of cancer making continued progress possible. Of course, these technologies come with their own technical and ethical challenges, but as long as we recognise that, we should be optimistic about the benefits they can bring to cancer patients.