Monitoring urban air quality, and predicting its future evolution is one of may problems which require modelling and understanding numerous quantities which vary in both space and time. This project is developing approaches based around state-of-the-art computational methods, and strategies to reduce very large problems into tractable collections of simpler problems which can be combined in a principled way. These approaches will be very broadly applicable, but in the present project will be applied to the problem of understanding air quality in Greater London.
Explaining the science
Methods for conducting inference, i.e. estimating the parameters of an indirectly observed system, in large complex systems are urgently needed. Existing technology does not generally scale well to the very large data sets which arise in many modern data-rich contexts. Most of the recent developments in computational statistics which aim at improving the scalability of existing algorithms have focused on data which has very particular forms and in particular can be viewed as very large numbers of replicates of measurements which are independent of one another. Such methods are not suitable for data sets which have strong spatial and temporal structures as, for example, many data sets obtained in urban analytic settings do.
This project aims to develop a suite of methodological tools for conducting inference in models of this sort in a computationally efficient way, by exploiting the structure of the models in order to provide simultaneously efficient computational tools and good estimation. Furthermore, leveraging recent developments in the field of robust statistics, these methods will be adapted to deal with settings in which the modelling is imperfect and the data generating process is not exactly characterized by the mathematical model. This robustness is essential to obtain good performance in real, complex scenarios.
The fundamental statistical objectives of the work are encapsulated within three work packages, each with its own particular objectives.
WP1: Aims to combine ideas from robust statistics with modern computational methodology to provide a computational framework for conducting estimation which is insensitive to outliers and certain types of model misspecification, which are inevitable in the types of models in which this project is interested. This work package has two strands. The first seeks to provide efficient algorithms for online inference as observations become available within this framework, by adapting algorithms known as particle filters to this setting. The second seeks to address a broader class of inference problems by employing sequential Monte Carlo samplers within a similar robust inference framework.
WP2: Aims to combine the ideas of WP1 with a class of algorithms known as divide and conquer sequential Monte Carlo methods which allow for inference in distributed settings in an efficient way, by iterative decomposition of large problems into a collection of smaller problems which can be addressed separately, with their solutions combined in a principled manner to provide good solutions to the original problem.
WP3: Specialises and extends the output of the previous two generic methodology work packages to the particular setting of spatio-temporal modelling and particular the types of Gaussian process models which are used in the application domain of interest, i.e. the monitoring of urban air quality.
In the application domain, the project will also provide a platform for inference in the context of urban analytics. More specifically, air quality monitoring in the Greater London area will be cast as a spatio-temporal inference problem of the sort that the proposed methodology is intended to address efficiently. In collaboration with the Greater London Authority the project aims to substantially enhance the quality and reliability of air quality monitoring and forecasting in this area.
An additional core outcome of the research will be the development and release of software to make use of the developed methodology, which will facilitate the use of these methods in the broad range of contexts in which they are expected to be applicable.
The work will be applied in the first instance to monitoring air quality and predicting its future evolution.
Air quality monitoring is a tremendously important and tremendously challenging area. Diverse sensor networks exist on different scales and provide measurements with quite different characteristics to one another. Fusing this information as observations become available is a large scale statistical inference problem. Indeed, problems of this type motivate the methodological development of this project and will serve as an extensive test-bed for the developed methodology. An extended application of those methods to air quality monitoring in the Greater London area with the support of the Greater London Authority provides the second major aspect of this work.
April 2020: Juan Kuntz-Nussio has joined the project.
September 2020: Robust Filtering Paper accepted by NeurIPS 2020.
October 2021: Theoretical properties of divide-and-conquer SMC report arxived.
October 2021: Divide-and-Conquer Fusion report arxived.
October 2021: Jure Vogrinc has joined the project.
December 2021: Product-form estimators paper accepted by Statistics and Computing.
April 2022: Node-wise pseudo-marginal method paper accepted by Statistics and Computing.
April 2022: Scalable particle-based alternatives to EM arxived.