Introduction
Complex control problems arise in the operation of critical infrastructure, including electricity, gas, water and transportation. This project is developing ‘rangl’, an AI competition environment for practitioners (both novice and experienced) to apply classical and machine learning techniques and expert knowledge to UK-centric problems. The project will work with industrial challenge holders to develop insights into the leading classes of solution.
Explaining the science
Infrastructure systems, including electricity, gas, water and transportation, must operate reliably at acceptable cost in the context of ageing infrastructure and new technological possibilities. At the same time a greater amount of measurement and forecast data is becoming available. There is a growing need for appropriate AI controllers to leverage this data. Artificial intelligence, with its speed, scale and accuracy, offers transformative potential in applications to these problems. However, controllers for critical infrastructure should be robustly assessed in an appropriate simulation environment.
This project is developing a competition platform in which challenge environments are formulated in the reinforcement learning framework using the “agent-environment loop”. At each timestep the controller (agent) chooses an action based on the current observation, and the challenge environment returns a new observation and a reward. The aim is to create controllers with intelligent characteristics, capable of handling both quantifiable and unquantifiable uncertainty and encoding expert knowledge. Competition entrants document their controllers, helping to develop insights into the leading classes of solution.
Project aims
'RangL' is a competition platform created at The Alan Turing Institute as a new model of collaboration between academia and industry. Through integration with OpenAI Gym, rangl offers a user-friendly environment to develop learning approaches to data-driven control problems. Anybody can propose a rangl challenge, compete in a challenge by designing a controller, or contribute an ‘off-the-shelf’ AI controller for users to customise.
The platforms assess user-submitted algorithms for specific tasks, helping the best classes of solution to emerge; a proven mechanism for realising the potential of AI.
Project updates
1 March 2022 – Pathways to Net Zero winners announced
The Pathways to Net Zero RangL challenge took place from 17 to 31 January 2022. After considering both the leaderboard and the executive summaries, the selection panel chose three joint winners:
- Epsilon-greedy (Delft University of Technology)
- Lanterne-Rouge-BOKU-AIT (University of Natural Resources and Life Sciences, Vienna, and Austrian Institute of Technology)
- VUltures (Vrije Universiteit Amsterdam).
Additionally, Team AIM-Mate were highly commended for their efforts. The Net Zero Technology Centre, who sponsored the challenge, will hold a webinar on 28th March 2022 with the winning teams. The final leaderboards can be viewed at the challenge repository. Thank you to all who participated.
22 November 2021 – Qualitative and quantitative evaluation
As the environment had reached a suitable state of development, this week the group discussed possible evaluation criteria for the challenge. The RangL project aims to collect best practice in the application of RL (and also other optimisation approaches) to control problems in industry. Thus while it would be possible to use only leaderboard scores, we decided to ask participants to submit a one-page executive summary of their approach, which would be considered in the overall evaluation.
8 November 2021 – Lagrangian formulation
In RL agents learn from experience. However if rewards occur only after a long sequence of actions, it can be difficult for an RL agent to associate this long sequence with the eventual reward. An example is a game of chess, if the reward is simply 1 for winning and 0 for losing. Similarly, if problem constraints are handled by awarding a large negative penalty when a constraint is breached, this can make reinforcement learning challenging. To address this, the group experimented with a Lagrangian reformulation to transform the constrained problem to an unconstrained one by modifying the reward function. As a result, the constraint on job numbers was replaced by adding a reward term proportional to the number of jobs created at each step.
18 October 2021 – Correlations
Typically a spreadsheet will not include any model of randomness – in other words, it is deterministic. In reinforcement learning it can be straightforward to include randomness, and this is one of the strengths of RL.
We will leave the choice between deterministic (without randomness) and stochastic (with randomness) modelling for another blog post. The interesting point for us this week was correlations: how we describe the extent to which random factors tend to move together. In deterministic modelling there is simply no need to think about how, for example, the prices of natural gas price and blue hydrogen (which is derived from natural gas) move together. In contrast, in stochastic modelling different correlations could even drive different solutions. Fortunately, in this project we have the luxury of discussions with some of the creators of the Integrated Energy Vision (the spreadsheet model on which the challenge is based), so correlations can be chosen in an informed way.
4 October 2021 – Direct actions
Today the group agreed a modification to the challenge outline: the RL agent will now directly specify the rate of deployment of offshore wind, blue and green hydrogen. This is achieved by allowing the RL environment to interact directly and repeatedly with the IEV spreadsheet model, both simplifying the approach and increasing its transparency and interpretability. Initial results from training an RL agent with this environment were shared and sense-checked.
20 September 2021 – Project video
Having decided the general shape of the challenge, today the project group agreed to begin working in an agile fashion through a GitHub project board.
Excitingly, RangL was also invited to be part of the Net Zero Technology Centre's virtual showcase "Road to Glasgow: Destination Net Zero" at the 26th UN Climate Change Conference of the Parties (COP26) in Glasgow in November 2021. A virtual exhibition booth will include "meet the developers" live sessions and a project video explaining the Pathways to Net Zero challenge.
19 July 2021 – Reward functions
In Reinforcement Learning the agent learns to maximise the rewards it receives. In this way the reward function is an integral part of the problem statement, and this week’s efforts centred around finalizing its exact form. Given the aims of the study, the project group decided to include the cost of total carbon emissions in the reward alongside UK energy sector profits (that is, energy revenues minus capital, operating and decommissioning costs).
The reward function can also be used to place constraints on acceptable solutions. While the shift to zero-carbon technologies can lead to increased employment in the long term under our modelling, it is important to ensure that job numbers are also managed in the short and medium term.
It was agreed that recent volatility in energy market prices highlights the importance of incorporating randomness in the RL environment. In addition to reflecting real-world uncertainty, this allows the agent to learn to adapt under a variety of scenarios.
1 July 2021 – Pathways to Net Zero RL environment
RangL aims at applying reinforcement learning (RL) to solve real-world industrial problems by involving participants from the wide AI community. Today, our focus was therefore on developing an appropriate RL environment for the Pathways to Net Zero challenge.
The objective is to find optimal deployments for technologies such as offshore wind, blue and green hydrogen, and carbon capture and storage. These technologies will be instrumental in reaching the UK’s target of net zero carbon by 2050.
After brainstorming we opted to take Breeze, Gale and Storm as baseline scenarios from which others can be built. An agent will interact with the RL environment by choosing a mix of those scenarios and also by varying the speed with which they are implemented. For instance, earlier deployment reduces lifetime emissions but generally implies higher capital costs. Solutions will also need to meet some non-monetary constraints, e.g. balancing job creation in new technologies against the loss of roles in decommissioned infrastructure. We will also work with the Net Zero Technology Centre and ORE to extend the Integrated Energy Vision appropriately, so that lifetime emissions and their social cost can be considered in the RL reward function.
21 June 2021 – Pathways to Net Zero project kick-off
The RangL competition platform exists to accelerate progress in data-driven control problems for industry, and today marks the real beginning of that journey.
The purpose of the Net Zero Technology Centre is to develop and deploy technology for an affordable net zero energy industry, and the Offshore Renewable Energy (ORE) Catapult is the UK’s leading technology innovation and research centre for offshore renewable energy.
Today, colleagues from the Net Zero Technology Centre, ORE Catapult and RangL team gathered to make a start on the Pathways to Net Zero challenge. The agenda was focused on first introducing the competition platform and then understanding the Net Zero Technology Centre / ORE Catapult Integrated Energy Vision (IEV) model, on which the challenge will be based.
The IEV is the result of a major modelling exercise undertaken collaboratively by the Net Zero Technology Centre and ORE Catapult, and addresses the UK's vision of achieving net zero carbon emissions by 2050 for the North Sea offshore energy industry. The range of possibilities is illustrated by three imagined pathways, Breeze, Gale and Storm, each addressing the four main technology pillars of offshore energy: offshore wind, oil and gas, hydrogen, and carbon capture and storage (CCS).
The Pathways to Net Zero challenge aims to build on the IEV by first allowing a range of intermediate pathways between Breeze, Gale and Storm, then defining a criterion to measure the quality of each pathway in a specific sense. Challenge participants will be invited to apply reinforcement learning, or any other approach of their choice, to find the ‘best’ pathway. The challenge will be made more realistic and difficult by the inclusion of uncertainty over future parameters such as energy revenues and technological progress.
18–25 January 2021 – First RangL Challenge
From 18 to 25 January 2021 the RangL team fulfilled a long-held ambition: to run a generation scheduling challenge.
This ‘look-ahead mode’ generation scheduling was one of the first motivations for RangL, when the project was
The competition itself was heavily oversubscribed, with applicants from Argentina, Denmark, the Netherlands, Italy,
We recently argued on the Turing blog that as the world reopens following the pandemic, we will need to make more