This document reports on an The Alan Turing Institute Data Study Group (DSG) investigating a Reinforcement Learning (RL) challenge posed by Defence Science and Technology Laboratory (Dstl).

Dstl is “the science inside UK defence and security.” As such, they provide evidence required by Defence to make effective decisions. For example, a planning exercise seeks to optimise the use of available resources to achieve a desired effect. Simulations and games can be helpful tools in answering the required questions.

RL has been shown to be able to generate effective (even super-human) agents to play games such as Go and Starcraft. Dstl would like to investigate RL in the context of games that are relevant in the defence sphere, and in particular whether RL can provide solutions which are adaptable to changes in the configuration o f the game. The aim of this DSG is thus to investigate the effectiveness of RL techniques when the rules of the game change between the training phase and the deployment phase.

Citation information

Data Study Group team. (2021, July 22). Reinforcement Learning Study Group Report – February 2021. Zenodo. https://doi.org/10.5281/zenodo.5121558

Additional information


David Leslie is a Professor in the Mathematics and Statistics Department at Lancaster University, with research interests in statistical learning, decision-making and game theory. Following a PhD at the University of Bristol entitled 'Reinforcement Learning in Games', he has continued research in this field throughout his career to date, including in the ALADDIN project, a large strategic partnership between BAE Systems and EPSRC, and involving researchers from Imperial College, Southampton, Oxford, Bristol and BAE Systems, using ideas from game theoretical learning to develop mechanisms for decentralised control. He also carries out research in bandit algorithms, recommender systems and Bayesian optimisation, including during a period working with Prowler.io (now Secondmind.ai). He is currently carrying out research on decision-making in the NG-CDI project, an EPSRC and BT strategic partnership, and co-leads the EPSRC-funded Data Science of the Natural Environment project at Lancaster University and the Centre for Ecology and Hydrology.

Dr Gregory Palmer is a PostDoc at the L3S Research Center. He completed his PhD in 2019 in the Department of Computer Science at the University of Liverpool under the supervision of Professor Karl Tuyls and Prof. Rahul Savani. Throughout his PhD, Gregory developed a number of approaches that enable independent learners to overcome multi-agent learning pathologies within fully collaborative team games. He also worked with the HAL allergy group on automating the inspection of opaque liquid vaccines. Within IIP-Ecosphere, Gregory is responsible for leading the data think tank, and works on Robust Adversarial Reinforcement Learning for industrial environments within the Think Tank AI and Production.

James Butterworth is a third year PhD student at the Computer Science department of the University of Liverpool under the supervision of Professor Rahul Savani and Professor Karl Tuyls. Throughout his time at Liverpool, he has published in the areas of Evolutionary Algorithms, Neuroevolution, Robotics and Argumentation. His current research themes involve using generative models to learn genotype-phenotype maps such that evolutionary search occurs in a lower – and more precise – lower dimensional space, resulting in reduced training speeds. During his PhD, James completed a six month internship at Jürgen Schmidhuber’s AI startup: NNAISENSE where he used evolutionary algorithms to optimise non-differentiable geometric models.

Rahul Savani is a Professor of Computer Science at the University of Liverpool. His research interests include Algorithmic Game Theory, in particular equilibrium computation, and (multi-agent) Reinforcement Learning.