How can machine learning help us unlock historical maps?

A project at the Turing is using machine learning to ‘read’ map text and create rich new datasets for humanities researchers and heritage professionals

Friday 25 Mar 2022

Historical maps contain a vast wealth of knowledge. They provide a unique record of features, landscapes and settlements that may not exist anymore, or that have since been dramatically transformed. Maps represent a significant body of global cultural heritage, and they are being digitally scanned at a rapid pace around the world. However, tapping into this data can be challenging, particularly map text, which is an almost entirely unused source of information. 

The Alan Turing Institute’s ‘Machines reading maps’ project is hoping to change this. We are using machine learning to ‘read’ and enrich thousands of digitised maps, offering users the potential to interpret maps on a large scale. Two challenges lie at the heart of this project:  detecting text on scanned maps and making that text meaningful. 

To address these challenges, we have brought together two tools in a single online platform. The first, mapKurator, is a machine learning (ML) pipeline developed at the University of Southern California and University of Minnesota that automatically extracts text from historical maps. It combines text detection, recognition and linking to external knowledge bases such as WikiData or OpenStreetMap, which enables the text to be associated with unique or generic features such as mountains, towns and roads. The second is Recogito, a tool for manually annotating maps.  

Recogito annotation interface detail. Map image courtesy of the National Library of Scotland.
Recogito annotation interface detail. Map image courtesy of the National Library of Scotland.

By combining these tools, our platform takes advantage of both automated annotations generated via computer-vision and machine learning, and expert knowledge gathered from manual annotation. This provides an accessible and effective way of working on maps at different scales, both in terms of number and detail. 

A major goal of the project is the transformation of text on Ordnance Survey (OS) maps of Great Britain (National Library of Scotland and British Library) and Sanborn fire insurance maps (Library of Congress) into searchable data. This will make it possible for users to explore these maps based on names or the types of features they’re interested in (e.g. “find all maps containing a library or a hospital during the 1800s”). In this way, the project is creating an unprecedented amount of free and reusable data. 

To test the performance and usability of the new, integrated annotation platform, we expanded our collaboration with the National Library of Scotland (NLS), and enrolled a large number of passionate volunteers in the endeavour of creating gold standard data to fine-tune the ML model. The NLS has a lot of experience with volunteers annotating maps, with, for example, the very successful GB1900 project. This time, they want to use the MRM platform to transcribe place names appearing on 19th-century OS maps of Edinburgh, during a community event this spring. Learn more about this NLS project here

Map image courtesy of the National Library of Scotland.
Detail of the OS Map of Somerset (XIV.5, published 1888) Image courtesy of the National Library of Scotland

Building on the experience working with the public through the NLS crowdsourcing event, we are excited to announce a new collaboration with the David Rumsey Map Collection, one of the largest collections of digitised, georeferenced maps in the world. Our ultimate goal is to enable users to search a library catalogue for maps based on the text they contain. Along the way, volunteers may contribute to improving the machine-generated data and the project’s team will improve its methods using a diverse collection of 60,000 maps. Our aim with the Rumsey collection is to develop an openly available prototype for map collections, like digitised newspapers, to be searched by their text. 

Deciphering the maps in each of these distinctive collections provides challenges from a computer vision perspective and creates new evidence to help answer historical research questions. For example, we can use mapKurator outputs from Ordnance Survey maps to examine how the choice of representing antiquities points to Victorian attitudes towards the past. And the Rumsey collection, with nearly global coverage, lends itself well to reflecting on the representation of railways or other transportation infrastructure on maps around the world. 

These examples show how we hope our work will enhance accessibility and discoverability of features on historical maps and enable humanities and heritage professionals to streamline and scale up their map analysis.  

Find out more about the project here

Machines Reading Maps was originally funded by the AHRC and the NEH, and is a collaboration between The Alan Turing Institute, the University of Southern California Libraries, the University of Minnesota, the National Library of Scotland, the British Library and the Library of Congress. The project is now also supported by a generous gift from David Rumsey and additional support from The Alan Turing Institute.