AI for control problems

Using a competition platform to accelerate progress in data-driven control problems

Project status



Complex control problems arise in the operation of critical infrastructure, including electricity, gas, water and transportation. This project is developing ‘rangl’, an AI competition environment for practitioners (both novice and experienced) to apply classical and machine learning techniques and expert knowledge to UK-centric problems. The project will work with industrial challenge holders to develop insights into the leading classes of solution.

Explaining the science

Infrastructure systems, including electricity, gas, water and transportation, must operate reliably at acceptable cost in the context of ageing infrastructure and new technological possibilities. At the same time a greater amount of measurement and forecast data is becoming available. There is a growing need for appropriate AI controllers to leverage this data. Artificial intelligence, with its speed, scale and accuracy, offers transformative potential in applications to these problems. However, controllers for critical infrastructure should be robustly assessed in an appropriate simulation environment.

This project is developing a competition platform in which challenge environments are formulated in the reinforcement learning framework using the “agent-environment loop”. At each timestep the controller (agent) chooses an action based on the current observation, and the challenge environment returns a new observation and a reward. The aim is to create controllers with intelligent characteristics, capable of handling both quantifiable and unquantifiable uncertainty and encoding expert knowledge. Competition entrants document their controllers, helping to develop insights into the leading classes of solution.

Control system visualisation
Example of a three-zone power system based on Great Britain. Scheduling power plants is challenging due to uncertainty about consumption and renewable generation in the near future. Supply and demand must match in real time, and energy transfers between regions are limited by network capacity. Rangl invites AI to come up with cost effective strategies for scheduling the power system under uncertainty, with the potential to save consumer money, accelerate decarbonisation and improve the reliability of electricity supply.

Project aims

'RangL' is a competition platform created at The Alan Turing Institute as a new model of collaboration between academia and industry. Through integration with OpenAI Gym, rangl offers a user-friendly environment to develop learning approaches to data-driven control problems. Anybody can propose a rangl challenge, compete in a challenge by designing a controller, or contribute an ‘off-the-shelf’ AI controller for users to customise.

The platforms assess user-submitted algorithms for specific tasks, helping the best classes of solution to emerge; a proven mechanism for realising the potential of AI. 

Project updates

1 March 2022 – Pathways to Net Zero winners announced

The Pathways to Net Zero RangL challenge took place from 17 to 31 January 2022. After considering both the leaderboard and the executive summaries, the selection panel chose three joint winners:

  • Epsilon-greedy (Delft University of Technology)
  • Lanterne-Rouge-BOKU-AIT (University of Natural Resources and Life Sciences, Vienna, and Austrian Institute of Technology)
  • VUltures (Vrije Universiteit Amsterdam).

Additionally, Team AIM-Mate were highly commended for their efforts. The Net Zero Technology Centre, who sponsored the challenge, will hold a webinar on 28th March 2022 with the winning teams. The final leaderboards can be viewed at the challenge repository. Thank you to all who participated.

22 November 2021 – Qualitative and quantitative evaluation

As the environment had reached a suitable state of development, this week the group discussed possible evaluation criteria for the challenge. The RangL project aims to collect best practice in the application of RL (and also other optimisation approaches) to control problems in industry. Thus while it would be possible to use only leaderboard scores, we decided to ask participants to submit a one-page executive summary of their approach, which would be considered in the overall evaluation.

8 November 2021 – Lagrangian formulation

In RL agents learn from experience. However if rewards occur only after a long sequence of actions, it can be difficult for an RL agent to associate this long sequence with the eventual reward. An example is a game of chess, if the reward is simply 1 for winning and 0 for losing. Similarly, if problem constraints are handled by awarding a large negative penalty when a constraint is breached, this can make reinforcement learning challenging. To address this, the group experimented with a Lagrangian reformulation to transform the constrained problem to an unconstrained one by modifying the reward function. As a result, the constraint on job numbers was replaced by adding a reward term proportional to the number of jobs created at each step.

18 October 2021 – Correlations 

Typically a spreadsheet will not include any model of randomness – in other words, it is deterministic. In reinforcement learning it can be straightforward to include randomness, and this is one of the strengths of RL.  

We will leave the choice between deterministic (without randomness) and stochastic (with randomness) modelling for another blog post. The interesting point for us this week was correlations: how we describe the extent to which random factors tend to move together. In deterministic modelling there is simply no need to think about how, for example, the prices of natural gas price and blue hydrogen (which is derived from natural gas) move together. In contrast, in stochastic modelling different correlations could even drive different solutions. Fortunately, in this project we have the luxury of discussions with some of the creators of the Integrated Energy Vision (the spreadsheet model on which the challenge is based), so correlations can be chosen in an informed way. 

4 October 2021  Direct actions 

Today the group agreed a modification to the challenge outline: the RL agent will now directly specify the rate of deployment of offshore wind, blue and green hydrogen. This is achieved by allowing the RL environment to interact directly and repeatedly with the IEV spreadsheet model, both simplifying the approach and increasing its transparency and interpretability. Initial results from training an RL agent with this environment were shared and sense-checked. 

20 September 2021  Project video

Having decided the general shape of the challenge, today the project group agreed to begin working in an agile fashion through a GitHub project board. 

Excitingly, RangL was also invited to be part of the Net Zero Technology Centre's virtual showcase "Road to Glasgow: Destination Net Zero" at the 26th UN Climate Change Conference of the Parties (COP26) in Glasgow in November 2021. A virtual exhibition booth will include "meet the developers" live sessions and a project video explaining the Pathways to Net Zero challenge.  

19 July 2021  Reward functions 

In Reinforcement Learning the agent learns to maximise the rewards it receives. In this way the reward function is an integral part of the problem statement, and this week’s efforts centred around finalizing its exact form. Given the aims of the study, the project group decided to include the cost of total carbon emissions in the reward alongside UK energy sector profits (that is, energy revenues minus capital, operating and decommissioning costs). 

The reward function can also be used to place constraints on acceptable solutions. While the shift to zero-carbon technologies can lead to increased employment in the long term under our modelling, it is important to ensure that job numbers are also managed in the short and medium term. 

It was agreed that recent volatility in energy market prices highlights the importance of incorporating randomness in the RL environment. In addition to reflecting real-world uncertainty, this allows the agent to learn to adapt under a variety of scenarios. 

1 July 2021 – Pathways to Net Zero RL environment

RangL aims at applying reinforcement learning (RL) to solve real-world industrial problems by involving participants from the wide AI community. Today, our focus was therefore on developing an appropriate RL environment for the Pathways to Net Zero challenge. 

The objective is to find optimal deployments for technologies such as offshore wind, blue and green hydrogen, and carbon capture and storage. These technologies will be instrumental in reaching the UK’s target of net zero carbon by 2050.

After brainstorming we opted to take Breeze, Gale and Storm as baseline scenarios from which others can be built. An agent will interact with the RL environment by choosing a mix of those scenarios and also by varying the speed with which they are implemented. For instance, earlier deployment reduces lifetime emissions but generally implies higher capital costs. Solutions will also need to meet some non-monetary constraints, e.g. balancing job creation in new technologies against the loss of roles in decommissioned infrastructure. We will also work with the Net Zero Technology Centre and ORE to extend the Integrated Energy Vision appropriately, so that lifetime emissions and their social cost can be considered in the RL reward function.

21 June 2021 – Pathways to Net Zero project kick-off

The RangL competition platform exists to accelerate progress in data-driven control problems for industry, and today marks the real beginning of that journey.

The purpose of the Net Zero Technology Centre is to develop and deploy technology for an affordable net zero energy industry, and the Offshore Renewable Energy (ORE) Catapult is the UK’s leading technology innovation and research centre for offshore renewable energy.

Today, colleagues from the Net Zero Technology Centre, ORE Catapult and RangL team gathered to make a start on the Pathways to Net Zero challenge. The agenda was focused on first introducing the competition platform and then understanding the Net Zero Technology Centre / ORE Catapult Integrated Energy Vision (IEV) model, on which the challenge will be based.

The IEV is the result of a major modelling exercise undertaken collaboratively by the Net Zero Technology Centre and ORE Catapult, and addresses the UK's vision of achieving net zero carbon emissions by 2050 for the North Sea offshore energy industry. The range of possibilities is illustrated by three imagined pathways, Breeze, Gale and Storm, each addressing the four main technology pillars of offshore energy: offshore wind, oil and gas, hydrogen, and carbon capture and storage (CCS).

The Pathways to Net Zero challenge aims to build on the IEV by first allowing a range of intermediate pathways between Breeze, Gale and Storm, then defining a criterion to measure the quality of each pathway in a specific sense. Challenge participants will be invited to apply reinforcement learning, or any other approach of their choice, to find the ‘best’ pathway. The challenge will be made more realistic and difficult by the inclusion of uncertainty over future parameters such as energy revenues and technological progress.

18–25 January 2021 – First RangL Challenge

From 18 to 25 January 2021 the RangL team fulfilled a long-held ambition: to run a generation scheduling challenge. The problem involves using continually updated forecasts for energy demand and renewable energy generation to schedule, and so to minimise, the use of fossil fuels. It is challenging partly because the observation space is large — at each step, the agent is given forecasts for all time periods — and also because the forecasts are updated as new information arrives, so are guaranteed to be superseded by better ones.

This ‘look-ahead mode’ generation scheduling was one of the first motivations for RangL, when the project was conceived in early 2019 during the Mathematics of Energy Systems research programme at the Isaac Newton Institute in Cambridge. While not directly connected, it’s interesting to note that the forthcoming special issue of Philosophical Transactions of the Royal Society A based on the MES programme has an article by Peter Glynn and Jacques de Chalendar on theoretical aspects of this kind of problem (titled “On incorporating forecasts into linear state space model Markov decision processes”).

The competition itself was heavily oversubscribed, with applicants from Argentina, Denmark, the Netherlands, Italy, France and the UK, drawn from academia, industry and the third sector. We’d like to thank all participating teams, who generated a fantastic atmosphere on our Slack channel throughout the week. It must have been good, as one competitor even joined the RangL team. The winners were team zeepkist with members from the Intelligent Electrical Power Grids group at TU Delft and Tennet, the Dutch power system operator. The final scores, and zeepkist’s winning code (which used RL), are here in the challenge repository.

We recently argued on the Turing blog that as the world reopens following the pandemic, we will need to make more flexible, responsive and data-driven decisions. Hopefully this first challenge illustrates a small part of the potential role that reinforcement learning can play.


Researchers and collaborators

Jia-Chen Hua

Postdoctoral Research Technician, Queen Mary University of London