Introduction

Recent advances in technology allow us to profile cellular identities in unprecedented detail. However, the utility of these methods is dependent upon the development of mathematical and computational routines able to reliably extract meaningful biological information. This project combines state-of-the-art single-cell profiling data with advances in data analytics and network modelling that are able to go beyond clustering to identify combinations of genes that oscillate in a coordinated way. Genetic oscillators have important roles in regulating the timing of cellular dynamics and detecting them is essential to correctly identify cell types and transitions between cell states.

Explaining the science

Network models have been extremely successful at revealing structural and dynamical properties of real-world complex systems and are used extensively in systems biology. For instance, gene expression patterns are ultimately controlled by complex underlying genetic networks. Typically, details of the underlying dynamics of these networks are not fully known, but partial information concerning some interactions, in the form of curated molecular regulatory networks, is often available. It's possible to use this information to better represent cell genetic expression profiles and, in particular, to better understand the dynamical transitions from one cell state to another. 

Network models are limited by their ability to encode pairwise relations only (whether two nodes, say two genes, are connected by a link). A natural generalisation is to consider triangles (fully connected triples of nodes), tetrahedral and higher-order fully connected sets of nodes. This combinatorial structure, called a simplicial complex, may be constructed from data by treating each gene as a node and drawing connections between nearby genes if they share similar expression patterns in a cell population. Although this process depends on a choice of scale (a threshold distance below which pairs of nodes are connected), the calculations can be made robust by tuning this scale parameter and only considering those features that persist on a wide range of length scales. 

This approach belongs to an area known as topological data analysis (TDA). Topology is the mathematical study of ‘shape’ and the motivation behind TDA is to detect and quantify high dimensional ‘shapes’ in data. This technique is perfectly suited to detect the signature of genetic oscillators in gene expression data. Just as static gene expression patterns give rise to clusters in data, patterns of oscillations give rise to closed ‘loops’. These loops cannot be identified using standard analysis methods (particularly in high dimensions) but leave a topological signature that can be identified by TDA techniques adapted to this particular type of data.

Project aims

This project aims to develop computational methods to automatically detect groups of genes oscillating in a coordinated way from the wealth of complex data that modern cell-profiling techniques supply. Dr Sanchez Garcia's approach combines network modelling and topology to recognise the topological signatures of oscillatory behaviour in the single-cell expression data. 

Once new genetic oscillator have been identified, their presence and significance can be verified experimentally, and used to characterise cell types and transitions between cell states. This is crucial to understand cellular dynamics, because many important genes are known to vary dynamically in their expression within individual cells. 

The project uses mouse embryonic stem cell data already produced at the University of Southampton. Embryonic stem cells have the potential to differentiate into any type of cell in the body, a property known as pluripotency. Understanding the principles and processes that govern pluripotency and differentiation is key for the development of stem cell therapies as well as contributing to fundamental biological knowledge.

Applications

Advances in understanding of the basic biology of stem cell differentiation will be critical if stem cells are to be used safely and routinely in clinical applications. By detecting oscillators, and other complex dynamical behaviour, we can better identify cell types and transitions between states. This has the potential to contribute to future personalised medicine methods by providing more accurate molecular diagnostic from single-cell gene expression data. This would, for instance, benefit patient stratification for predicting drug response rate in a population cohort, screening optimal cohorts for drug trials, or identifying priority candidates for disease sub-phenotypes.

The project results will also directly contribute to biology research, and open up new research directions in the use of network and topological methods in the qualitative study of complex biological systems.

Organisers