Data science challenges in high-throughput biology and precision medicine

Edinburgh, 25th-27th November 2015

Main organisers: Guido Sanguinetti, Mark Girolami, David Rand, Richard Savage, Florian Markowetz

Biomedical science has been one of the most important domains of application of computational and statistical modelling for decades. Recent technological advances in high-throughput biology such as next generation sequencing are potentially heralding a step change in biomedicine, enabling scientists to generate very large data sets on multiple levels of biological organisation. These disruptive new technologies hold enormous potential to understand the molecular mechanisms underlying human disease, however they pose fundamental challenges to data science. It is increasingly clear that unlocking this potential will require both a deeper integration of data modelling at all steps of the biomedical research pipeline, and the solution of several methodological challenges brought about by the scale, heterogeneity and variability intrinsic in the data: how can we build statistical models which can integrate the many layers of high-dimensional data (genetic, epigenetic, gene/ protein expression, phenomics…), often sampled at very different levels (>10^5 individuals in genetics studies, approx 10^3 for epigenetic studies, lower for gene expression)? How do we characterise the inter-individual and inter-cell variability, and how does this affect any predictive models? How do statistical issues inform policy regarding future large data collection initiatives? How does one rapidly disseminate statistical methodological advances into biomedical, and ultimately clinical, research? This workshop will bring together leading statistical modelling and biomedical expertise from the UK community, as well as industrial and funding body representatives, to discuss the present and future challenges for data science and biomedicine in realising the translational potential of high-throughput molecular data.