Imagine you applied for your dream job but didn’t even get an interview. Now, you’ve found out that the company you applied to uses artificial intelligence (AI) to process its job applications. Would you feel mistrustful of the technology used to inform the decision? If all we understand is that our data went into an algorithm, something happened to it and then a decision popped out – a ‘black box’ of AI – how can we really know that we’ve been treated fairly?
In the UK and abroad, The Alan Turing Institute is leading conversations on ethical issues just like these. Projects funded by the Turing’s AI for science and government (ASG) programme are helping to establish best practice in responsible research and innovation (RRI) in data science. All of these projects centre not on the technologies themselves but on the people who have the most to gain or lose from them.
ASG-funded Project ExplAIn is helping organisations to understand how and why they should explain the decisions made by their AI systems. As Research Associate at the Turing Morgan Briggs points out, “it’s about who’s on the other end of the explanation. People want to know how a decision was made about their data using AI.” It’s this broader view of AI, extending beyond the purely technical, that shapes the project’s ‘stakeholder-first’ perspective. To craft ExplAIn’s comprehensive guidance ‘Explaining decisions made with AI’ (2020), the Turing team including Director of Ethics and Responsible Innovation Research, David Leslie, collaborated directly with members of the public, scientists and industry experts in a series of workshops and roundtables. The guidance aims to assist organisations in building ‘explainability’ into AI systems from the start and tailoring explanations to different audiences. It is helping the UK government to set standards in ‘AI assurance’, which will in turn help to ensure that we can trust in AI systems.
Two years on, the team has developed workbooks that use practical examples – like the job application processing example above – to help organisations make use of the guidance. Briggs and Leslie piloted these workbooks in a series of virtual and in-person workshops. Meanwhile, Keeley Crockett, an AI ethics researcher at Manchester Metropolitan University, has been using the guidance in AI ethics workshops for small and medium-sized enterprises in Manchester, as part of a £3 million AI Foundry project – 126 companies have taken part so far. She’s also using case studies developed through ExplAIn as teaching resources to give her master’s students in data and AI ethics a grounding in explainable AI. “We’re getting these master’s students to understand more about explainability as a unique selling point for their future employment within companies,” she says.
Overseas, the guidance has already informed recommendations for explaining AI put together by the US National Institute of Standards and Technology. According to Briggs, “there’s mutual investment and interest in this idea of explainability” and of placing AI in its wider, social context.
Project ExplAIn emerged from official UK government guidance on responsible AI for the public sector, developed by Leslie in partnership with the Office for Artificial Intelligence. This internationally recognised guidance, ‘Understanding AI ethics and safety’ (2019), has been put into practice by at least a dozen different government departments. Its purpose is to help guard against some of the potential harms of AI within public sector settings. When AI is used to help improve cancer diagnosis, for instance, patients need to be clear what the AI system is doing and why, and who is liable for any mistake. Leslie and Research Assistant Cami Rincon are now collaborating with government and public sector partners to update the guidance and, through a bespoke training programme, are providing the tools to embed it within public sector AI projects for societal benefit.
According to Leslie, this guidance was “the wellspring” for the values and principles shaping other Turing projects, including a ground-breaking collaboration with Camden Council that resulted in the first ever data charter co-created by a local authority and its residents. In this community-owned project, Ethics Fellow Chris Burr and Rincon worked with 20 Camden residents to explore key concepts in data ethics, helping the residents to define – through their own deliberations – the principles that now guide how the council collects and uses residents’ data. As the council’s Data Custodian, Brendan Kelly, explains, the charter sits at the heart of all its work with data. “Every data project goes through an ethical assessment that has been written out because of the principles in this charter,” he says. He adds that the residents went away with a better understanding of how data is used in a council setting, and more trust in what Camden is doing with their personal data.
An open book
Another group grappling with difficult issues around data science is the data scientists themselves. Progress in science depends on reproducibility: using transparent data and methods so that experiments can be checked and repeated, and evidence built upon. But this isn’t always the case in AI and data science, where code and details of complex models often remain unpublished. The Turing strives to work in a different way, embedding reproducibility into its data projects from the start so that innovative new tools and software can be shared and adapted. One of the central functions of the Research Application Manager, a key Turing role, is to help researchers adopt open-science practices so that the outcomes of their work reach more diverse users.
However, the Turing can’t change the culture of data science alone – that requires a more coordinated effort. So, since 2019, ASG-funded project The Turing Way has been supporting the data science community at large to lay down the principles and practices of reproducible research in an openly available, online handbook. The project shares much the same ethos as ExplAIn, with project co-leads Malvika Sharan and Kirstie Whitaker noting that its ‘sociotechnical’ perspective puts greater emphasis on people than technology.
The Turing Way now has over 350 authors and contributors, with many collaborating during regular online co-working events. It is at the forefront of a shift in how data science is done, influencing UK government strategy on reproducibility in data analysis and informing the Health Foundation (which tackles real-world health and social care issues through data analytics) as it adopts more open practices. The Turing Way is also used internationally. Biostatistician and founder of the Latin American community MetaDocencia, Laura Acion, uses The Turing Way to explain responsible data use in her work with Argentinian data projects, including ARPHAI, which focuses on using AI and data science in policy-making to tackle epidemics.
At NASA HQ, meanwhile, Program Scientist Chelle Gentemann recognises The Turing Way as a key resource for the space agency’s $40 million, five-year mission Transform to Open Science (TOPS), which aims to get NASA scientists learning about and applying principles of open science. “As we go and talk to NASA scientists, they ask us for resources,” says Gentemann. “I point them to The Turing Way.” As well as showing scientists how to make their work reproducible, she says, it shows them how to make it more inclusive, so that more people can contribute.
TOPS has also put together a diverse team of experts to develop an open science course for scientists – the OpenCore Curriculum – covering aspects such as the ethos of open data, and best practices for data-sharing. “Rather than reinventing the wheel”, as one of the team, Christopher Erdmann, former Assistant Director for Data Stewardship at the American Geophysical Union, puts it, they were able to draw from The Turing Way’s open materials.
Open resources are also at the core of ASG-funded project, the Turing Commons. Targeted not just at data scientists but anyone who needs to understand responsible data use, this incorporates the ‘Citizen’s guide to data’ that Burr designed to support his work with Camden residents. Burr also developed guidebooks and activities on ‘Responsible research and innovation’ (RRI) and ‘Public engagement of data science and AI’ that formed the basis for two live, online training courses attended by at least 50 people worldwide. Mayara Carneiro, a tech lawyer in Brazil, took the first RRI course in November 2021. She says it “helped me to have a better notion of what I should take into account when building ethical guidelines and requirements” for tech companies using AI.
Burr and Research Assistant Claudia Fischer are now refining and expanding the resources to support more interactive and tailored content, and in the tradition of the Turing, they’re doing it collaboratively. According to Burr, they’ve had “significant interest from UK universities” in more openly accessible resources for students and researchers across different specialisms, especially on RRI in data science and AI. “Many universities reach out to us to say, ‘You’re creating free, open-access resources. Can we work together?’,” he says. So, for example, partnering with the University of Edinburgh’s Biomedical AI Centre for Doctoral Training, they are now co-designing biomedical case studies that will be shared via the redesigned Turing Commons platform.
RRI is also an important aspect of existing ASG-funded training offered by the Turing’s Research Engineering Group (REG). As Senior Research Data Scientist Federico Nanni explains, whilst many students learn to code during their PhDs, they don’t always learn “best practices” – how to build software that other researchers can reuse and extend. Nanni says the benefit of delivering the training via the REG is that its members work across multiple academic disciplines, offering diverse perspectives for the 40 students who have so far taken the ‘Research data science’ course.
It’s not just PhD students who need data science training, though. REG Data Research Scientist Lydia France had this realisation when collaborating with the Francis Crick Institute on some training for biomedical researchers. It became apparent that some project leaders were avoiding computational work altogether. “They don’t know where to begin because there is no training,” she says. So France redirected her efforts at senior biomedical researchers, mining the wisdom within The Turing Way to build free-to-access masterclasses on how to manage and supervise open and reproducible data projects.
Now France and Nanni are targeting skills gaps across other subject areas, including the humanities, leveraging the broad expertise within the REG. “There are definitely lots of disciplines that are facing challenges because they are collecting such gigantic amounts of data,” Nanni says. “I think that’s where we can contribute the most.”
So, as the Turing sets the standards for more ethical, reproducible and collaborative data science, it is also delivering the skills needed for this more responsible approach to the field. At the same time, community-led and participatory projects funded by the ASG programme are giving more people the power to influence how data-driven technologies are developed, and how their personal data is used, with public benefit to the fore. Better data ethics means putting people, rather than technology, first.
Header illustration: Jonny Lighthands