Bio

Paolo Missier is a Reader in Large-Scale Information Management with the School of Computing at Newcastle University. Paolo's background is broadly in data and information management, with a specific long-term research interest in data provenance, where amongst other research contributions, he was instrumental to the specification of the PROV data model for provenance, through the W3C.

Closer to Big Data management and Data Science and Engineering, more recently Paolo studied the problem of optimising re-computation of resource-intensive data analytics processes when their outcomes are invalidated through changes in the processes' inputs and dependencies (ReComp: http://recomp.org.uk, 2016-2019, EPSRC funding).

More can be read about current and past projects on Paolo's personal page.  Paolo's 20+ years career trajectory started with an industry applied research position at Bell Communication Research (1994-2001), before obtaining his PhD in Computer Science at University of Manchester in 2007, leading to his current academic career, first at Manchester (2008-2010) and currently at Newcastle, since 2011.

Research interests

At the Turing, Paolo is leading the [email protected] project on using Data Science in the healthcare domain, specifically to help realise a vision of predictive, personalised and participatory medicine. The project aims to study the predictive power of signals extracted from free-living wearable devices, in combination with personal genetic features, with specific focus on age-related diseases.

The [email protected] project will address some of the key data science challenges associated with the development of new preventive, predictive and personalised models that define the data-driven future of healthcare. It will contribute to the Turing's 'Revolutionise healthcare' challenge, with specific focus on (i) enabling early and precise detection, diagnosis, and treatment of illnesses, and (ii) predicting or preventing diseases for those at highest risk.

Chronic and metabolic diseases are a leading cause of death and disability. Current data acquisition technology provides the means to fully characterise an entire population of individuals in terms of a broad diversity of quantitative datasets, ranging from periodic but low-rate multi-omics data (genomics, proteomics, metabolomics, and more) to continuous and high rate self-monitoring data from wearable sensors. Translating this wealth of population-scale big data into models that can predict early disease onset at the granularity of the single individual entails multiple data science challenges. These include integrating across and understanding the correlations amongst diverse signals, engineering multi-scale features from monitoring time series data, exploring machine learning approaches to building effective predictive models, as well as data engineering challenges, such as distributed architectures for scalable data processing.

The UK BioBank will provide an initial testbed for experimentation, consisting of a cohort of individuals who are longitudinally characterised in terms of genotyping, phenotyping, and free-living monitoring from wearable devices. Additional testbeds will be added for independent testing and validation. This project is a collaboration between the School of Computing and the Institute of Genetic Medicine at Newcastle University, the National Innovation Centre for Ageing, and the NIHR Innovation Observatory, both based in Newcastle, with funding from the Turing.