Streaming data modelling for real-time monitoring and forecasting

Developing scalable algorithms for extracting useful information from large, complex and ever-growing data sets in near real time

Project status



Organisations increasingly want to gain actionable insight from data as it is generated or recorded, in near real time. Statistical models describing the data-generation process, used in combination with recently developed fitting algorithms and computational infrastructure designed for streaming data, together form a very promising approach to usefully solving this problem. The methods developed in this project will be applied to challenging real-world problems in urban analytics and healthcare.

Explaining the science

Integrated modelling of data sources from streaming data networks will typically require the inversion of explanatory stochastic state-space models. Although significant computational and methodological advances have been made in this area in recent years, true simultaneous inference for both static parameters and dynamic states of non-trivial models remains challenging. Recent developments in sequential Monte Carlo (SMC) methodology provide a number of promising approaches for further study.

Project aims

Recent methods from the literature for online analysis and forecasting of state-space models will be tested, developed and extended through application to two challenging real-world applications. The methods will be published in both the statistical and applied literature, and an open-source software library will be produced to enable routine application of the newly developed methods by any interested parties.


In addition to producing improved statistical algorithms, this work will also directly advance the state-of-the-art in two user domains.

The first application will be to the live streaming data from Newcastle’s Urban Observatory – one of the largest public sources of smart city data in the world. Data from environmental sensors which are both multivariate and spatially distributed will provide a challenging use case for online statistical modelling.

The second application will be to healthcare analytics data, such as from wearable sensors. Joint modelling of multiple heterogeneous sensors in real time will be undertaken for forecasting and alerting.


Researchers and collaborators