The crisis of reproducibility in science is well known. The combination of ‘publish or perish’ incentives, secrecy around data and the drive for novelty at all costs can result in fragile advances and lots of wasted time and money. Even in data science, when a paper is published there is generally no way for an outsider to verify its results, because the data from which the findings were derived are not available for scrutiny. Such science cannot be built upon very easily: siloed science is slow science.
That’s one of the reasons funders and publishers are beginning to require that publications include access to the underlying data and analysis code. It’s clear that this new era of data science needs a new cultural and practical approach, one which embraces openness and collaboration more than ever before. To this end, a group of Turing researchers have created The Turing Way – an evolving online “handbook” on how to conduct world-leading, reproducible research in data science and artificial intelligence.
It is also a global community of research engineers, data librarians, industry professionals and research experts in various domains, at all levels of seniority, dedicated to capturing and sharing research best-practise, tools and data. The moonshot goal of the project is to make reproducible research "too easy not to do".
The project launched in December 2018 and is headed by Turing Research Fellow Kirstie Whitaker, Theme Lead for Practices within the Turing’s Research Engineering Programme. “Changing research culture is no small thing,” says Whitaker. “It's wonderful that The Turing Way is a project: it’s an investment from the Turing in a traditional academic environment that doesn’t place enough value on openness.”
How did it start?
Whitaker, a neuroscientist by training, is a high-profile and passionate advocate of open, reproducible research. She saw an opportunity to kick-start the project as part of a major funding package for the Turing, originating with the UK Research & Innovation’s Strategic Priorities Fund.
With Whitaker in charge, it’s no surprise that The Turing Way takes an open-source, community-driven approach: it is a ‘meta-project’ of sorts, developed and documented openly from the very start. “I have a vision, and I am driving it forward, but The Turing Way doesn't belong to me, and it isn't written by me. Literally anyone can contribute. That’s the special sauce of the project,” she says.
The Turing Way core team currently consists of 10 Turing researchers, drawn from across the Institute, and a community manager, but the handbook itself already has more than 60 contributors.
“Literally anyone can contribute to The Turing Way. That’s the special sauce of the project.”
Kirstie Whitaker, Turing Research Fellow
Right now, the handbook is focused primarily on the reproducibility of data science, with sections covering topics such as open research, version control, collaborating on a variety of platforms – including GitHub and Binder – research data management, and reproducible computing environments. The book doesn’t just deal with the “how” of things, but also the “why” – the ethos and long-term benefits of reproducible research.
The reproducibility handbook is just the start. The resource will also grow to cover other key areas of data science knowledge, including additional handbooks on research design, effective collaboration, communicating and visualising results, the ethics of data science and much more. But all this is just half the story. The power of The Turing Way will be transmitted through its growing, global community of like-minded advocates.
To foster The Turing Way community, Whitaker’s team organised a number of events, including training workshops and “book dashes”. Dashes begin with dinner and lightning presentations in the evening, before being followed the next morning by an intense day of collaborative work on the handbook. Dashes were hosted at the Turing and at the University of Manchester, in May 2019, with more to come.
The handbook has already inspired UCL’s Institute of Health Informatics Code Club to create a similar online repository of wisdom. “Not gonna lie, this bible is heavily inspired by The Turing Way,” they acknowledge.
"Our 'bible' is heavily inspired by The Turing Way"
UCL’s Institute of Health Informatics Code Club
Also keen to inspire researchers to use forward-thinking data science tools, the Turing has teamed up with the people behind Binder, a key research platform that enables researchers to package and manipulate data in a convenient, highly shareable way, through customisable computing environments that can be accessed using a simple web link. It’s effectively a remote desktop which, in coordination with a GitHub repository, contains all the data, research notes and manipulation software that researchers need to work on complex data, or give feedback on their students’ work, even from the comfort of their smartphone.
Binder is powered by a ‘BinderHub’ and in March this year, the Turing Way team ran the first ‘Build a BinderHub’ workshop anywhere in the world, to teach research engineers and IT professionals how to host their own Binder infrastructure. Private BinderHubs are used to support researchers using either data that cannot be ethically shared or commercially sensitive software. In late 2019, The Alan Turing Institute will join the publicly available myBinder.org federation of BinderHubs to support open and reproducible research around the world.
Care to share?
Something often cited as an obstacle to reproducible science is that it requires scientists to give others access to their precious data. The Turing Way handbook debunks the idea that giving everything away is the main aspect of reproducibility, says Whitaker. “Reproducible research simply means that someone else can get the same result as you. They’ll need access to the data and code to do that, but those resources don’t have to be public. In my opinion, any personal data that was collected by research teams or companies does not belong to them. They are stewards of the data: it belongs to the individuals who gave that data. What’s more, in academic research, particularly that funded by research councils, the person who paid for that data is typically the taxpayer in almost all instances.”
Nevertheless, Whitaker concedes that in practice, the world does not reward sharing in science. “That's absolutely true. One of the goals of the Turing Way is to shine a floodlight on that. To change this incredibly broken publish-or-perish system that actually works against sharing and making research more effective.”
The Turing Way’s focus is not necessarily on open data but on FAIR data wherever possible: findable, accessible, interoperable, and reusable. “The reuse of other people’s data is providing useful insights for new research questions and products, and driving new scientific discoveries,” writes Susanna-Assunta Sansone, Associate Professor in Data Readiness at the University of Oxford, in The Turing Way. Sansone was an author on the first article on the FAIR principles, published in Nature Scientific Data. “In research data management, the history is the future,” she observes.
The Turing Way and the scientific worldview it promotes appeals particularly to forward-thinking early-career data scientists. “They feel welcomed, they feel like they have found like-minded people in The Turing Way community. They have more confidence in sharing their skills and taking new skills back to their research communities,” says Whitaker.
“Before I joined the Turing, I had never really considered the importance of the reproducibility of my code,” says Turing Enrichment PhD student Maxine Mackintosh. “Now I can’t imagine ever not doing it. It’s great to be in an environment that is supporting us as we learn these difficult skills that are not, yet, part of mainstream academic culture.”
But it’s important that senior people engage too, says Whitaker, even if they don’t necessarily have the most up-to-date technical skills. “We need to share the responsibility for reproducible science. That – along with good collaboration, communication and ethics – should be seen as fundamental to the scientific method.”
What does the future hold?
With funding secured, the goal for the ‘handbook’ is to become a comprehensive ‘How-To Guide for Data Science’, with many more sections beyond reproducibility. A key section for Whitaker will be around collaboration: “Something that really bothers me is that academic researchers, generally speaking, don’t know how to collaborate effectively. We’re not trained for it. How do you facilitate conversations across disciplines where technical terms and expectations are different, how do you run a participatory event like a hackathon, how do you build an inclusive space where all points of view are respected?”
The group has also begun a fortnightly Online Collaboration Café, where researchers can meet to work and chat.
By the end of 2020, the team are expecting 20 new chapters of the book to be available, from over 200 contributors, and with more than 1000 data wranglers of all stripes on the mailing list. They’re also aiming to contribute to other open source projects that are introduced by the growing Turing Way community.
The team will also be advocating that scientists should get credit for making their data available. “Collecting data takes expertise, time and money”, says Whitaker. “Yet in academia, we really only prioritise ‘the story’ – the published paper. All research outputs, including data, should be recognised for their longer-term usefulness. That’s how we’ll build up a firm footing of fundamental research to underpin data science.”
The Alan Turing Institute is pledged to change the world for the better through data science and artificial intelligence. With the Turing Way, we are committing to changing data science itself for the better. You are cordially invited to join us.