Approximately 75% of breast cancers are classified as estrogen receptor (ER) positive. In these cancers, the ER drives the growth of the cancer and is the key therapeutic target. However, resistance occurs in up to approximately 40% of patients. In these cases, the patient no longer responds to targeted therapy. This project aims to develop methods for applying machine learning to data generated in the lab, to discover new potential targets to treat patients with resistant ER positive breast cancer.

Explaining the science

The approach to analyse the data in this project will be similar to that of a neural network implementation for image classification. In place of images to systematically train a neural network, the data will be thousands of matrices (one for each cell) representing the perturbations it received. In place of the image annotation (e.g. cat vs. dog), it will have the effect of the perturbations on the transcriptional state of the cell. To aid learning, the transcriptional state can be simplified into gene sets that reflect the phenotypes to be predicted, e.g. toxicity or proliferation.   

The resulting neural network will have highly interesting properties that make this project of particular note from a data science perspective. The underlying structure to be recalled from the network is already in itself a biological network; it is therefore possible to use this feature of the data to further explore it. For example, by constraining the first layer of the neural network so that it represents protein-protein interactions of ER cofactors, with deeper layers representing higher-order interactions. A second opportunity is to apply adversarial neural network methods to predict the best combination of perturbations to achieve the best outcome. Finally, as the data includes information on the ordering of perturbations, it's possible to explore how the ordering of therapeutic targets affects patient outcome. The wide-ranging opportunities show the value in generating this data.

Project aims

The aim of the project is to develop the experimental tools and computational methods needed to generate and interpret data that enables the investigation of potential combinatorial therapies, that bypass resistance and minimise toxicity in the most common form of breast cancer. 

The project will achieve this aim by using the latest single-cell sequencing technologies to generate data that contains detailed information on both the impact of the ordering of potential interventions, and how different combinations of therapeutic targets provide beneficial effects in both treatment responsive and resistant breast cancer models.

The data is ideal for the application of machine learning methods, in particular neural networks because of its complexity, and will provide insight into the regulation of the drivers of breast cancer and predict pre-clinical therapeutic solutions to overcome resistance. 


The outcomes of this project will provide technologies that can be applied to a wide range of diseases to discover potential new combination therapies. In addition the generated results will form the basis of future research with the aim of improving patient outcomes.