Matching cutting-edge experimental protocols with appropriate analysis tools is an important part of biological data science. Single-cell mRNA sequencing (scRNA-seq) allows scientists to understand complex biological processes, such as cancer. This project aims to develop statistical methodology to fully exploit the rich information contained in scRNA-seq datasets.
Explaining the science
Single-cell mRNA sequencing (scRNA-seq) allows scientists to quantify gene expression - the process by which information from a gene is used in the synthesis of a functional gene product - for thousands of genes on a cell-by-cell basis.
Statistical analysis of scRNA-seq datasets is necessary to gain a greater understanding of biological processes. This analysis involves significant challenges, both from an experimental perspective, as well as technical issues. There is a need to to de-noise the data, separating biological signal from technical artefacts. Developing appropriate analysis tools in this context is critical to extract data-driven biological insights that are robust to technical variation.
This lecture from the 'Machine Learning for Personalized Medicine' Summer School 2015 provide a useful introductory resource.
This project involves producing and refining a model called 'Bayesian Analysis of Single Cell Sequencing (BASiCS) data' which utilises information from different data sources to extract required insights from the data.
In supervised learning studies BASiCS uses a probabilistic decision rule to identify changes in gene expression between two or more pre-specified populations of cells.
Ongoing work extends BASiCS to accommodate complex experimental designs, including unsupervised studies where the aim is to discover and characterise novel cell populations.
BASiCS is available in Bioconductor.