Matching cutting-edge experimental protocols with appropriate analysis tools is an important part of biological data science. This project aims to develop statistical methodology tailored to datasets generated by single-cell mRNA sequencing (scRNA-seq). These experiments allow scientists to quantify gene expression for thousands of genes on a cell-by-cell basis. This information is critical to understand complex biological processes, such as cancer.
Besides experimental challenges, statistical analysis of scRNA-seq datasets is a challenge itself. An important task in this context is to de-noise the data, separating biological signal from technical artefacts. Developing appropriate analysis tools in this context is critical to extract data-driven biological insights that are robust to technical variation.
Bayesian Analysis of Single Cell Sequencing (BASiCS) data is an integrative model which borrows information from different data sources to achieve this goal. In supervised studies, BASiCS uses a probabilistic decision rule to identify changes in gene expression between two or more pre-specified populations of cells. Our ongoing work extends BASiCS to accommodate complex experimental designs, including unsupervised studies where the aim is to discover and characterise novel cell populations.