Data Study Group Final Report: The National Archives, UK

Discovering topics and trends in the UK government web archive


The challenge we address in this report is to make steps towards improving search and discovery of resources within this vast archive for future archive users, and how the UKGWA collection could begin to be unlocked for research and experimentation by approaching it as data (i.e. as a dataset at scale). The UKGWA has begun to examine independently the usefulness of modelling the hyperlinked structure of its collection for advanced corpus exploration; the aim of this collaboration is to test algorithms capable of searching for documents via the topics that they cover (e.g. ‘climate change’), envisioning a future convergence of these two research frameworks. This is a diachronic corpus that is ideal for studying the emergence of topics and how they feature through government websites over time, and it will indicate engagement priorities and how these change over time.

Citation information

Data Study Group team. (2021, June 18). Data Study Group Final Report: The National Archives, UK. Zenodo.

Additional information

David Beavan, The Alan Turing Institute

Fazl Barez, University of Edinburgh

Mark Bell, The National Archives, UK

John Fitzgerald, University of Oxford

Eirini Goudarouli, The National Archives, UK

Konrad Kollnig, University of Oxford

Barbara McGillivray, The Alan Turing Institute and University of Cambridge

Federico Nanni, The Alan Turing Institute

Tariq Rashid, Digital Dynamics

Sandro Sousa, Queen Mary University of London

Tom Storrar, The National Archives, UK

Kirill Svetlov, Saint Petersburg State University

Leontien Talboom, The National Archives, UK and University College

Aude Vuilliomenet, University College London

Pip Willcox, The National Archives, UK