Data/Culture: Building sustainable communities around Arts and Humanities datasets and tools

Project status

Ongoing

Introduction

Data/Culture is a sandbox for the re-use of data and tools in the humanities and arts, in ways that develop high-quality research and strong collaborative communities. 

In its pilot year (2023-24), the team is focused on: 

  • Building communities of historians around 1) the Seshat: Global History Databank and 2) the Living with Machines project, whose outcomes include the MapReader software library, as well as data and tools related to historical British newspapers.
  • Establishing a stronger network of Research Software Engineers for the Arts and Humanities, driving collaborative innovation and embedding skills. Delivering a roadmap for a national capability tailored to Arts and Humanities needs.
     

Explaining the science

Data/Culture is a unique experiment in arts and humanities research: it provides extended support for projects that have invested expertise, time, and of course funding in the development of tools and data for digital research. At the same time, it is an environment in which to scope the needs of digital research infrastructure in the arts and humanities so that future investment in these areas has a longer lifespan and leads to significant research findings across these disciplines.

Due to the nature of the current project-focused funding landscape, tools and data often never find an audience beyond the original project team. This means that return on investment is poor. Even with the best due diligence, new projects often reinvent the wheel, building the same or very similar datasets and tools, because they do not know about, or cannot access, open research components that already exist. Other times, projects develop data that is so bespoke, it is nearly impossible to re-use.

If those components could be designed with re-usability in mind up front, if they were well packaged and documented, if communities of users and maintainers were actively built around them, and if skills to exploit them were embedded in disciplinary pedagogy, we could create the basic components of an open and modular digital research infrastructure that would help accelerate research innovation.

This infrastructure could take many forms.; however, knowledge about what works specifically for the arts and humanities is sparse and anecdotal. What is needed is a robust evidence base from which to derive standards and models of good practice. Learning from initiatives outside the UK (such as DARIAH across Europe, Huma-Num in France, or CLARIAH in the Netherlands) as well as efforts within the UK (e.g. Towards a National Collection), we will collaborate with colleagues to shape a future where the arts and humanities are as well-placed to pursue digital research as other disciplines.

Data/Culture is therefore a space in which to demonstrate what happens when we build and re-use data and tools with the aim of contributing to digital infrastructure for the arts and humanities. 
 

 

Project aims

In our pilot year, we focus on the open data and software outputs of two projects: Living with Machines (LwM) and Seshat: Global History Databank, to create examples of open modular digital research infrastructure that can be re-used in other research contexts.

During this pilot phase we will: 

  • Drive excellence in the development of research data and software in the arts and humanities by developing LwM and Seshat outputs into resources that have utility beyond these projects. 
  • Accelerate research innovation by re-using and documenting data and software components
  • Create sustainable communities of practice that maintain and develop these outputs in new directions
  • Establish a Research Software Engineer network for the arts and humanities
  • Create a roadmap for a national Research Software Engineer capabilities

In order to meet these aims, during the pilot phase of this initiative we will work with three key communities

  • Research Software Engineers with experience working with arts or humanities data, their employing research organisations, as well as those looking to move into this profession
  • Historians who would benefit from the chosen data and software
  • Libraries, archives and other dataholders that own or host humanities- and arts-related collections who are interested in questions about access and reuse of data, IP and the ethics of working with their collections as data using AI methods

Recommendations about how to build and maintain these three communities will inform future phases of this activity, in which the scope will be broadened to a wider portfolio of projects, scholarly disciplines and data holders, who will be engaged and rotated in and out throughout the lifespan of the full project. 
 

 

Applications

The main application will be to spin out the lessons learned from the pilot phase into other areas of the Arts and Humanities. The Seshat dataset will be linked with other world history datasets and the communities surrounding these datasets. 

Living with Machines tools and data will be showcased in a series of workshops where participants can learn about:

  • How to build data that is “re-use” friendly
  • Where to access historical collections for digital research, with a focus on maps and newspapers
  • How to ask key questions about data access, copyright, and licensing as part of a research project
  • What Machine Learning, Computer Vision, and Natural Language Processing can do to iform historical research
  • What kind of research is possible with digitized primary source collections in the UK
     

How can you get involved?

  • Apply for our Autumn workshop: 30 September - 2 October 2024! This workshop will be hybrid. During this workshop, historians and others interested in historical sources will have an opportunity to learn about and use open software and data created by Living with Machines researchers to lower barriers to working with digitised newspapers and maps.
  • Join a MapReader community call! Information on these is available here.
  • Find examples of our data on zenodo, models on Hugging Face, and software on Github.
  • Stay tuned for new research publications, data papers, and datasets emerging from the team!
     

Related content

Get in touch

Organisers

Dr Pieter Francois

Theme Lead for Arts, Humanities and Cultural Heritage, and Associate Professor in Cultural Evolution at the University of Oxford

Researchers and collaborators

Dr Pieter Francois

Theme Lead for Arts, Humanities and Cultural Heritage, and Associate Professor in Cultural Evolution at the University of Oxford

Dr Kalle Westerling

Research Application Manager, Turing Research and Innovation Cluster in Digital Twins (TRIC-DT)