Andrew Beggs is a Reader in cancer genetics & surgery at the University of Birmingham. He obtained his PhD in 2011 from research undertaken in the laboratory of Professor Ian Tomlinson (University of Oxford) and then moved to Birmingham in 2012 where he spent five years undertaking a Wellcome Trust postdoctoral fellowship for clinician scientists in the Institute of Cancer & Genomic Sciences. He obtained a Cancer Research UK Advanced Clinician Scientist award in 2017 and runs a mixed wet/dry lab that examines novel determinants of response to treatment in cancer. He also practices clinically as a Consultant Colorectal & General Surgeon at the Queen Elizabeth Hospital Birmingham.
Next generation sequencing data presents a particular challenge in the field of data analysis and integration. With the advent of the 100,000 Genomes project (100KG), a large volume of next generation sequencing data will become available for analysis along with detailed clinical phenotype data. Up to this point, methods for analysis of this data are relatively sparse and are limited to interactions between two dimensions of data, i.e. mutational and a clinical phenotype (e.g. tumour stage). This fails to take account of the richness of the available data, both genomic and clinical and also fails to account for the interactions at multiple levels that can occur within these datasets.
This diversity of data takes multiple forms. Firstly, although the 100KG project will output mutations and structural variants derived from whole genome sequencing data, there are advanced plans to add layers to this data by measurement of other genetic information. This data will include RNA-seq (to measure gene expression, alternate splicing etc), methylation, chromatin structure (via Hi-C and ATAC-seq) and tumour heterogeneity (via single cell sequencing). Finally, the clinical dataset that has been taken as part of this project will allow correction and modelling of many different environmental influences between genomic data and clinical outcomes. There are likely to be profound interactions with host and tumour genetics and the environment that must be accounted for in any model of disease.