This group unites international researchers and heritage professionals who have an interest in using digitised image collections (maps, photographs, newspapers, books) in computer vision tasks. Applying CV to historical datasets raises issues of provenance and bias as well as processing challenges distinct from the recent or born digital images used in most computer vision work. We offer researchers at the Turing and beyond an opportunity to establish connections and build this new interdisciplinary field together. We focus on shared practices in data science around computer vision, providing a much needed center of gravity for a growing, otherwise disparate community.
Explaining the science
Computer vision refers to a range of tasks and methods aimed at allowing computers to work effectively with images. Common computer vision tasks include the classification of images into different categories and segmenting images into smaller parts based on image content, for example, detecting which parts of an image contains a person. Computer vision methods have been applied across a broad range of scientific and engineering domains including application in self-driving vehicles, processing medical images and to process digitised documents to recognise text content (optical-character-recognition). Increasingly there is a growing interest in applying and expanding these techniques to work with heritage materials including maps, books, newspapers and paintings, which have been digitised. This raises questions about how well computer vision methods developed for other types of images perform on digitised heritage content as well as questions about how we effectively and responsibility work with collections at scale through automated methods.
How might heritage institutions and researchers work together to improve accessibility to large collections of images that have been and continue to be scanned?
Challenges: Many libraries, archives, museums, and galleries have restrictions on sharing images of their collections that have been scanned in the last few decades.
Example output: A white paper co-authored by researchers, curators, funders, and third-party digitisation partners about the future of open data access.
How can computer vision methods be used in humanities and heritage contexts?
Challenges: Computer vision literature currently focuses on problems related to certain kinds of relatively contemporary images (web content, photographs, remote sensing data, text with standard fonts).
Example output: Case studies demonstrating open challenges for working with older materials (printed and manuscript maps, woodcuts, newspapers with historical fonts and layouts).
How to get involved
Visit our GitHub page here.