Abstract
The challenge we address in this report is to make steps towards improving search and discovery of resources within this vast archive for future archive users, and how the UKGWA collection could begin to be unlocked for research and experimentation by approaching it as data (i.e. as a dataset at scale). The UKGWA has begun to examine independently the usefulness of modelling the hyperlinked structure of its collection for advanced corpus exploration; the aim of this collaboration is to test algorithms capable of searching for documents via the topics that they cover (e.g. ‘climate change’), envisioning a future convergence of these two research frameworks. This is a diachronic corpus that is ideal for studying the emergence of topics and how they feature through government websites over time, and it will indicate engagement priorities and how these change over time.
Citation information
Data Study Group team. (2021, June 18). Data Study Group Final Report: The National Archives, UK. Zenodo. https://doi.org/10.5281/zenodo.4981184
Additional information
David Beavan, The Alan Turing Institute
Fazl Barez, University of Edinburgh
Mark Bell, The National Archives, UK
John Fitzgerald, University of Oxford
Eirini Goudarouli, The National Archives, UK
Konrad Kollnig, University of Oxford
Barbara McGillivray, The Alan Turing Institute and University of Cambridge
Federico Nanni, The Alan Turing Institute
Tariq Rashid, Digital Dynamics
Sandro Sousa, Queen Mary University of London
Tom Storrar, The National Archives, UK
Kirill Svetlov, Saint Petersburg State University
Leontien Talboom, The National Archives, UK and University College
London
Aude Vuilliomenet, University College London
Pip Willcox, The National Archives, UK