Current biological knowledge is concentrated in a relatively small proportion of well-known genes and proteins. However there are also a plethora of large-scale biological datasets derived from genome-wide experiments that cover all human proteins or their genes. To try to predict functions for the many proteins whose function is unknown this project will use a deep learning approach to predict functionally coupled pairs using correlations in genomic, proteomic and phylogenetic input data sources. The work aims to assign unknown proteins to known biological processes and potentially reveal previously unknown biological pathways.
Explaining the science
Artificial neural networks have a proven track record in their ability to use pattern recognition to map from diverse sources of complex multi-dimensional input data to simple output data. This project exploits this to combine diverse types of biological data to answer the relatively simple question of whether two proteins work together for some biological purpose.
This project endeavours to exploit the wealth of large-scale biological information, that is somewhat underused, to detect proteins that work together within the same biological pathway. By learning from the large number of biological pathways that are already known, the project aims to use deep neural networks to discover previously unknown biological connections between different genes and proteins.
The project aims to both link uncharacterised proteins with existing biological pathways and identify clusters of proteins in entirely new pathways. This will allow researchers to begin to understand the roles of 'unknown' genes; those for which there is almost no functional annotation. For the human genome this corresponds to at least 30% of genes.
The project's predictions are testable using direct experimental assays, e.g. by using fluorescence microscopy or co-purification, and can inform on a large variety of biological contexts including medicine, industry and agriculture.
This research seeks to inform on fundamental molecular biology and thus is of potential benefit to all areas of biology and medicine, with the greatest initial impact being to inform other biologists about new components of known pathways or even new biological pathways.