Introduction

In a recent eLife point of view article, Turing research fellows Dr Daphne Ezer and Dr Kirstie Whitaker outline how data science can supercharge every aspect of the scientific lifecycle. The paper builds on an event they hosted at the Turing in October 2018 which brought together laboratory scientists, statisticians and social scientists for a day of knowledge sharing and discussion. Their goal was to harness the power of data science to make each stage of the scientific life cycle more efficient and effective.

The scientific life cycle

Traditionally, a researcher will read the academic literature to understand what is already known about the question they want to answer. Then they will design an experiment to test their hypothesis. Next comes collecting the data, and then analysing it. The final step is to publish a summary of this work, and then the whole process can start again.

Ezer and Whitaker highlight how data science can help the researcher in all the different aspects of this cycle.

  • Natural language processing can summarise thousands of published papers to draw out new hypotheses for a researcher to investigate.
  • Statistical methods can optimise the power of an experiment by selecting which observations should be collected.
  • Robotics and software pipelines can automate data collection and analysis, and incorporate machine learning analyses to adaptively identify which data needs to be collected based on incoming data.
  • The traditional output of research, a static PDF manuscript, can be enhanced to include analysis code and well-documented datasets to make the next iteration of the cycle faster and more efficient.
Scientific life cycle
Integrating data science into the scientific life cycle

Bringing together diverse disciplines

The 'Data science for experimental design' workshop that Ezer and Whitaker hosted on this topic represented the diversity of skills and backgrounds that can be impacted by data science. Four universities and one industry biotechnology firm were represented in the three sets of talks, which corresponded to planning (Dr Stephanie Biedermann and Dr Ozgur Akman), performing (Prof Ross King and Dr Vishal Sanchania) and analysing (Dr Sebastien Besson and Dr Rachael Ainsworth) experiments.

The workshop closed with a discussion led by three social scientists who had studied the behaviour of experimental scientists (Dr Sarah AbelDr Louise Bezuidenhout and Chris Mellingwood). All members of the workshop contributed to building a list of challenges faced by researchers who want to use data science techniques to improve the scientific life cycle.

Theory into practice

Many of the barriers to translating theory into practise are being investigated by groups at the Turing already. The eLife point of view article identifies a clear pathway to supercharge research in science and humanities and recognises the importance of training a new generation of data scientists who can work collaboratively across many different domains of expertise.

The Institute’s research engineering team and data science at scale programmes are helping to build capacity for lab and research infrastructure, and the health and medical sciences programme is building links within the national health service to help translate this fundamental work to ensure the greatest impact within the UK and beyond.

"It is imperative that we strive for equity in access to the outputs of research around the globe."

Ezer and Whitaker note the importance of another of the Institute’s core challenges: to make algorithmic systems fair, transparent, and ethical. It is imperative that experimental outcomes are interpretable and free of bias, that we strive for equity in access to the outputs of research around the globe, and that we collectively work to change the academic incentive structures that currently limit collaboration across disciplines.

Conclusion

Robots are not coming to replace the roles of scientists. Rather, there is an exciting future in which scientists are able to spend time augmenting the automated parts of the research cycle. They will always be needed to provide insight into what hypotheses are interesting to the community and describing how experimental results fit into proposed theories.

The scientist of the future will find innovative connections across disciplinary boundaries. They will be focused on the varied and interesting parts of science, rather than the mundane and repetitive parts.

Some sections of this article are re-used from the main article under a CC-BY licence. Daphne and Kirstie thank the speakers and attendees at the 'Data Science for Experimental Design' workshop, and Anneca York and the Events Team at The Alan Turing Institute.