The Alan Turing Institute and the British Library, together with researchers from a range of universities, have been awarded £9.2 million from the UKRI's Strategic Priorities Fund for a major new project.
Led by the Arts and Humanities Research Council (AHRC), ‘Living with Machines’ will take place over five years and is set to be one of the biggest and most ambitious humanities and science research initiatives ever to launch in the UK.
The programme will see data scientists working with curators, historians, geographers and computational linguists with the goal to devise new methods in data science and artificial intelligence that can be applied to historical resources, producing tools and software to analyse digitised collections at scale for the first time.
In recognition of the significant changes currently underway in technology, notably in artificial intelligence, the project will use the century following the first Industrial Revolution, and the changes brought about by the advance of technology across all aspects of society during this period as its focus point.
Explaining the science
The work is based on a strong collaborative research philosophy that will be methodical, self-reflexive and designed to evolve. The central datasets and research questions involved will drive the development of infrastructure, computational methods and tools. The outcomes of these methods will provide nuance to research questions and in turn research questions will help to hone and improve code, datasets and visualisations. This iterative, cyclical process will help the research team develop best practice for collaboration and exchange, joining the ethos and methods of data science and humanities research.
Engagement with a wider audience will also be planned into the evolving research programme: by inviting family and local historians to engage with the ongoing project, they will help direct research questions, methodological approaches, research outputs.
This iterative and collaborative ethos will be central at each stage of the process, and structured by forming ‘laboratories’ around the key methodological challenges posed by the project's research aims and historical questions. These laboratories are:
- Space and time
- Computational methods
Initial research plans involve scientists from The Alan Turing Institute collaborating with curators and researchers to build new software to analyse data drawn initially from millions of pages of out-of-copyright newspaper collections from within the archive in the British Library’s National Newspaper Building, and from other digitised historical collections, most notably government collected data, such as the census and registration of births, marriages and deaths. The resulting new research methods will allow computational linguists and historians to track societal and cultural change in new ways during this transformative period in British history. Crucially, these new research methods will place the lives of ordinary people centre-stage, rather than privileging the perspectives of decision-makers and public commentators.
‘Living with Machines’ will take a radical approach to collaboration, breaking down barriers between academic traditions, bringing together data scientists and software engineers from The Alan Turing Institute and curators from the British Library as well as computational linguists, digital humanities scholars and historians from universities including Exeter, University of East Anglia, Cambridge and Queen Mary University of London.
The research methodologies and tools developed as a result of the project will transform how researchers can access and understand digitised historic collections in the future.
The insights from the project will provide vital context for the present-day debates about the future of work, prompted by the social change caused by the so-called ‘fourth industrial revolution’ of artificial intelligence and robotics. For example, data-driven findings relating to how attitudes to machines and mechanisation changed during the nineteenth century could help present-day researchers and policy-makers to understand and unpick public understanding around current attitudes towards new technologies, for example autonomous vehicles or the use of artificial intelligence and robotics in everyday transactions.
Key starting points for the project include marshalling data, developing workflows and methods for ensuring data quality developing intuitive interfaces to facilitate collaboration with historians, and the launch of the collaborative research agenda through project ‘laboratories’. There will be future calls and opportunities for researchers to get involved.
Kaspar Beelen, Federico Nanni, Mariona Coll Ardanuy, Kasra Hosseini, Giorgia Tolfo, and Barbara McGillivray. ‘When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation’. In Findings of the Association for Computational Linguistics, pages 2751–2761, Association for Computational Linguistics, 2021.
Coll Ardanuy M, McDonough K, Krause A, Wilson D CS, Hosseini K, van Strien D. (2019). Resolving Places, Past and Present: Toponym Resolution in Historical British Newspapers Using Multiple Resources. Proceedings of the 13th Workshop on Geographic Information Retrieval (GIR'19).
Coll Ardanuy M, Nanni F, Beelen K, Hosseini K, Ahnert R, Lawrence J, … McGillivray B. (2020). Living Machines: A study of atypical animacy. The 28th International Conference on Computational Linguistics (COLING 2020).
Coll Ardanuy M, Hosseini K, McDonough K, Krause A, van Strien D, Nanni F. (2020). A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching. 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2020).
Hosseini K, Nanni F, Coll Ardanuy M. (2020). DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching. The 2020 Conference on Empirical Methods in Natural Language Processing.
van Strien, D, Ardanuy M, Hosseini K, Beelen K, McGillivray B, Colavizza G. (2020). Assessing the Impact of OCR Quality on Downstream NLP Tasks. ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence.
Filgueira, R, Jackson M, Terras M, Roubickova A, Beavan D, Hobson T,… Ahnert R. (2019). Defoe: A Spark-Based Toolbox for Analysing Digital Historical Textual Data. eScience 2019.
New crowdsourcing tasks launched
The project recently hit a major milestone in their five-year journey with a newly launched crowdsourcing task. The tasks will create a lexicon (a ‘dictionary’ of words) about machines that will aid computational research at scale and have been designed to unearth new stories, looking beyond what is currently known to reveal a richer picture of our past.
Take part in the task and help describe and classify words and phrases by reading small sections of newspaper articles.
A key contributor to the Living with Machines project was Joel Dearden, who sadly passed away this year aged 41. Joel is greatly missed by his former colleagues and collaborators, including Alan Wilson, whose personal reflection on Joel can be read here.