Introduction
Poor air quality in cities poses a significant threat to health and life expectancy. In London it’s estimated that more than 9,000 Londoners a year die early due to air pollution-related causes. Similar issues affect most cities across the UK, Europe and the wider world.
Air quality in London has improved in recent years as a result of policies to reduce emissions, primarily from road transport, although significant areas still exceed NO2 EU limits. There has also been a revolution in air quality monitoring, making it possible to monitor air pollution at thousands of different locations in the city, greatly enhancing the ability to target and prioritise planned interventions. However, such networks of sensors produce an overwhelming amount of data of variable quality.
Researchers from The Alan Turing Institute’s programme in Data-centric Engineering, in collaboration with the University of Warwick, are working with partners including the Greater London Authority, to develop machine learning algorithms, data science platforms and statistical methodology to better integrate and analyse this sensor data. With a better understanding of air pollution in a complex urban environment like London, it’s possible to design better policy interventions, and hence improve urban quality of life and life expectancy.
How did it start?
The project originally came about via Turing Fellow Dr Theo Damoulas’ connection with urban analytics expert Mike Flowers. Flowers was visiting London in 2015 to advise the Greater London Authority (GLA), the London Office for Data Analytics (LODA) and innovation foundation Nesta on data science initiatives in government and academic partnerships. Through this connection and over discussions at dinner between Dr Damoulas, Andrew Collinge, then Director of Intelligence and Analysis at GLA, and Paul Hodgson, GLA Geographic Information Systems and Infrastructure Manager, a pilot collaboration between the GLA and the University of Warwick was formed. A group of Warwick MEng Computer Science students, supervised by Dr Damoulas, began working on the data, computational, statistical and machine learning aspects of air pollution in London.
“It’s exciting to see machine learning being applied to a problem with wide-reaching impact on quality of life”
Mark Girolami, Programme Director for Data-centric Engineering
In 2017 The Turing’s programme in data-centric Engineering, funded by the Lloyd’s Register Foundation, began funding the research to further its reach and potential. Programme Director Mark Girolami explains how the work contributes to the programme’s aims. “It’s exciting to see machine learning methods being developed and scaled up to bear on a ‘big data’-scale problem with wide-reaching impact to the quality of life of Londoners. The challenges faced in the project in dealing with a complex process [like air pollution] include how to handle identifiability, interpretability and sampling bias within large scale datasets. These are challenges that arise across the programme.”
Damoulas reflects, “Being part of the Turing and under the research umbrella of the Data-centric Engineering programme has tremendous benefits for the project. We have unique access to the best data science and AI expertise in the country, strong leadership and a superb team of strategy and communication professionals. Everyone here helps to support and propel us further.”
What happened?
The group has been exploring ways to better integrate sensory information from various air quality sensors which vary significantly in their characteristics. “The group is handling more than 1TB of data sources, and that’s growing every minute,” Damoulas explains, “These capture various aspects of air in London from air pollution sensor measurements to traffic jams, weather and street canyons [the way air behaves in streets flanked by tall buildings]. We are developing algorithms that can deal with this variability, or ‘heterogeneity’”.
The project has benefitted from being part of navigation specialists Waze’s ‘Connected Citizens Program’, providing real-time traffic data which, in conjunction with TfL data being supplied by the GLA, allows for a more accurate picture of key areas and levels of pollution across London.

Statistically speaking, estimating and accurately forecasting air pollution from this data involves developing algorithms for ‘multivariate change point detection’ - trying to identify times when the probability distribution of a random process changes when you have multiple different variables, i.e. changes in air pollution from all the different sensor data sources. The group have developed and published such algorithms, and others dealing with related machine learning challenges, presenting them at prestigious machine learning conferences such as ICML and NeurIPS.
Another focus of the work is on improving the sensors themselves. “Many sensors exist but very few of them have been shown to be reliable enough for a complex city environment like London,” Research Associate Ollie Hamelijnck describes, “We are working with some of our partners to analyse our data to better understand the performance of air pollution sensors and develop better solutions.”
As well as the collaborations already mentioned, and various London Boroughs and communities with vested interests in addressing air pollution, the Turing has helped the group establish international collaborations; from the University of Sydney’s Centre for Translational Data Science and Data61, to the University of Hong Kong.
Through this research, development and collaboration, the group are forming a real-time monitoring network that allows for high resolution air quality forecasting and change point detection. Damoulas explains, “The tools we’re producing will help establish the most effective places to site future sensors, and inform policy makers to make targeted interventions that reduce the levels of pollution in key areas and at key times.”
The group are applying their research outputs into a mobile-friendly API (application programming interface), that will provide rolling hyper-local forecasts of air pollution across London, up to 48 hours ahead of time. Showcasing and giving the public access to this API is expected in the very near future.

What does the future hold?
“Working with the Turing will continue our efforts to make life better for all Londoners”
Theo Blackwell, Chief Digital Officer, Greater London Authority
Dr Damoulas: “We are working closely with the GLA at the moment to collect, integrate and release to the public additional data sources on street canyons and air pollution sensor measurements. 2019 to 2020 is our impact year, where a lot of the analysis, conclusions and interventions from our algorithms and models will take centre stage.”
As a result of the collaborations and research and development already established, the group will be able to access richer sources of traffic flow and air pollution measurements in 2019. This richer data will help the models and forecasts produced be even more accurate, reliable, and up to date. See the dedicated project page for updates about the continuing work.
Theo Blackwell, the GLA's Chief Digital Officer, says “Working with The Alan Turing Institute will continue our efforts to harness London’s world-class strengths in data science and innovation to clean up the air we breathe to make life better for all Londoners.”
The project will also be benefitting from support from the Turing’s Research Software Engineering team, who contribute skills in software engineering and data to projects across the Institute. The team will be helping develop and release additional algorithms, APIs and web platforms.
“Air pollution is an amalgamation of anthropocentric and natural processes that are coupled together,” Damoulas concludes, “Such dynamic, coupled processes are everywhere in cities, so the scientific advances we’re making in this project have the potential to impact the way we think about cities altogether.”
“The advances we’re making have the potential to impact the way we think about cities altogether”
Theo Damoulas, Turing Fellow