The members of The Alan Turing Institute’s Research Engineering Group (informally known as ‘Hut23’) are experienced data science researchers who are committed to professional delivery of impactful research, rather than to personal research interests. We support the mission of the institute by connecting pure research to applications, and by ensuring our research generates usable, sustainable tools.
We use the challenges that arise in practical projects to inspire new research initiatives, while bringing novel techniques and methods to customers and collaborators that go beyond current practice in data science.
Working with the Turing research community, and with research professionals in partner institutions, the group provides a comprehensive source of in-house research skills for the Institute’s activities.
Visit the Group’s lab pages which include information on their regular lunchtime tech talks and projects.
Our Data Scientists have expertise in computational statistics, inference, and machine learning, as well as mathematical and computational modelling of complex systems, knowledge representation, and operations research.
They apply their skills to clean, wrangle and analyse data, and to deploy analyses developed by Turing researchers on our high-performance computing platforms.
Research software engineering
Our Research Software Engineers collaborate with our researchers to build and maintain software that implements and supports the research activities.
Research Software Engineers work with researchers to create software requirements, develop code, document and explain the software, and support the release and maintenance of the software through open-source channels and publication in research journals.
Staff in this team have the skills and background to create readable, reliable and efficient compute- and data-intensive software, embodying in code sophisticated mathematical and statistical concepts.
AIDA is “Artificial Intelligence for Data Analytics”. In this project, researchers at the Institute are drawing on new advances in artificial intelligence and machine learning to address data wrangling issues; they aim to develop systems that help to automate each stage of the data analytics process.
This is a joint venture project with University College London, Imperial College and industry partners. Code Blue is a science gateway which scales to enable users to run fluid dynamic simulations in the cloud or their university computing infrastructure.
Working with the children’s charity Coram, this project is to explore how data collected on children in care can be modelled and visualised to help inform the decisions of local authorities.
Datadiff is an AIDA subproject which aims to automate the process of reconciling inconsistencies between pairs of tabular datasets whose information content is supposed to be similar.
When a dataset is batched into multiple tables, for instance due to periodic data collection, it is not uncommon to find discrepancies in format or structure between the different batches, such as the renaming and/or reordering of columns, changes in units of measurement or encodings, introduction of new columns, etc. Such differences impose an overhead on any consumer of the data wishing to join the separate pieces into a consistent whole.
Typically this process involves human intervention: people are good at resolving issues of this kind by spotting patterns and making educated guesses. Datadiff is an attempt to solve the problem algorithmically.
This is a short project led by Institute researcher Tomas Petricek. The Gamma project empowers anyone to examine data, learn to question the legitimacy of its data sources, and appreciate the context in which numbers are presented. The code behind the project is open source.
This project with the National Cyber Security Centre will attempt to identify websites that are provided “unofficially” by government and are not under a well-known top-level domain.
PDQ (proof-driven querying) is a Java library developed by researchers at the University of Oxford for generating database query plans over semantically-interconnected data sources with diverse access interfaces.
A key target application is to the problem of making NHS data accessible to data scientists while respecting constraints imposed by privacy, integrity and efficiency. Our project aims to bring this goal closer by refining and extending the library’s query execution functionality.
SHEEP is a homomorphic encryption evaluation platform. Homomorphic encryption allows mathematical operations to be performed on encrypted data, such that the result when decrypted is what would have been obtained from addition or multiplication of the plaintexts. The goal of the project is to provide an accessible platform for testing the features and performance of the available homomorphic encryption libraries.
Topological Data Analysis
The goal of this project is to rewrite existing topological data analysis code created by a Turing PhD student to firstly, run massively parallel on an effectively unlimited number of CPUs with near linear scaling and secondly, to integrate into at least one public software package.
Working with us
Turing Fellows and students looking for help from the Group should email firstname.lastname@example.org.
Work is usually recharged to various Turing research budgets, or to other research grants, and we will work with you to find an appropriate funding source.
The team is also keen to get involved in research grants being developed by the Institute and collaborating universities – we can be included to provide any amount of software engineering or data science effort for your grant.
While the work is usually costed, the Group is occasionally able to support unfunded projects for free where there is a strong strategic reason – do get in touch.
Soon we hope to be able to share our expertise with external partners. In the interim, please contact email@example.com to register your interest in working with us.
Dr James Hetherington
James is Director of Research Engineering at the Institute, and of the Research Software Development Group at UCL. His work focuses on ensuring that scientific software meets the highest standards of software architecture and quality assurance.
He has worked with researchers in many fields – from DNA whodunit software, modelling the future of the UK electricity network, and analysis of ancient Mesopotamian texts to computational fluid dynamics in brain blood vessel networks. A founding chair of the UK Community of Research Software Engineers, James is a Fellow of the Software Sustainability Institute, and has advised BBSRC, JISC, and EPSRC on research software issues.
James has a PhD in computational particle physics from Cambridge, and has extensive computational research experience, including developing systems for multiscale modelling in physiology with the DTI Beacon Project and new tools for extreme scale computing with the CRESTA EU FP7 Exascale project.
In addition to his academic experience, he has industrial research software engineering experience, and was senior developer in the Model Management Group at The MathWorks, makers of MATLAB.
Dr Martin O’Reilly
Martin is Principal Research Software Engineer at the Turing, leading the software engineering side of the group. His focus is on using good software engineering practices to increase the impact of research software by making it reusable, reliable and robust. He also has a strong interest in reproducible research, and is helping to improve the tools and working practices available at the Turing to make it easier for researchers to work reproducibly.
Martin has a PhD in computational neuroscience from UCL and an MSc in artificial intelligence from Edinburgh. He has extensive experience of developing software and managing software projects in both the academic and commercial sectors, having previously worked on developing web applications, cross-industry data standards, exam processing systems and image processing algorithms.
Dr May Yong
May is a Research Engineer at The Alan Turing Institute. She has worked across multiple research domains, collaborating with scientists to turn interesting ideas into useable software. Past projects include machine learning on neonatal intensive care data and tools for enabling the exploration of UK government expenditure data.
She has previously worked on creating data standards for antibody therapy experiments at UCL Cancer Institute and worked on amalgamation of heterogeneous multiple sclerosis data at the Imperial Data Science Institute.
May is interested in data interpretation, specifically ways of providing information in perspective in order to see the bigger picture. She is building tools to minimise data ambiguity so that data is interpreted and used the way the data collectors intended.
Dr Radka Jersakova
Radka is a Junior Data Scientist at The Alan Turing Institute. She is passionate about using data science tools in collaboration with researchers, research software engineers and external partners (industry, government, third sector) to help solve real world problems. She is particularly interested in applications of Bayesian methods.
Prior to joining the Institute she completed a PhD in Cognitive Psychology at the University of Leeds. During her PhD she was involved in a number of projects evaluating and developing behavioural research methods.
She also spent time as a visiting researcher at the University of Burgundy and Columbia University and she holds an MA in Philosophy and Psychology from the University of St Andrews.
Angus is a Junior Data Scientist at The Alan Turing Institute. He has a PhD in Astronomy from the University of Cambridge, which involved modelling the output of surveys of the Milky Way galaxy using Bayesian approaches.
He also has experience in commercial data science, where he has worked on projects involving natural language processing for document classification, and deep learning for image and video classification.
His interests include: Bayesian methods and probabilistic programming, machine learning and Python.
Tim is a mathematician and developer of software for data analysis. He received his PhD from the University of Warwick for research in the field of probability and stochastic processes, after which he worked in the City of London as a consultant specialising in quantitative risk management.
In 2011 he relocated to Rio de Janeiro, his wife’s home town, and worked as a quantitative analyst at Fundação Getulio Vargas, a leading Brazilian university and think tank. In this role he led projects in diverse areas of data science including econometric modelling of international trade, methods for constructing composite indicators and state space time series analysis. He also taught courses on topics in applied economics and statistical computing.
After six years in Brazil, Tim returned to London to take up the role of Senior Research Software Engineer at The Alan Turing Institute. In addition to data analysis and visualisation his interests include object-oriented design, functional programming, cryptography and cryptocurrency.
Dr Oliver Strickson
Oliver is a Junior Research Software Engineer at The Alan Turing Institute. He has a PhD in physics from the University of Cambridge, where he was based at the Cavendish Laboratory and worked on techniques for producing accurate material models for continuum mechanics simulations based on first-principles atomistic techniques, with a particular focus on shock-waves in condensed matter.
Following this, he worked as a software engineer, writing research software for the numerical simulation of fluid and solid dynamics.
He was involved in several successful industrial collaborations with partners in the mining, defence and oil and gas industries. He also holds a degree in mathematics.
Dr Nick Barlow
Nick is a Research Software Engineer at The Alan Turing Institute. He has a PhD in particle physics from the University of Bristol, and worked as a postdoc for the University of Manchester on the BaBar experiment at the Stanford Linear Accelerator Center, and then for the University of Cambridge on the ATLAS experiment at the Large Hadron Collider at CERN.
In both of these experiments, he was involved in both the expoitation of the huge datasets generated by the colliders, and with the software infrastructure faciliting the analysis.
He then worked as a software developer at the National Cyber Crime Unit, helping to develop their capability to process and analyse large and complex datasets.
Dr James Geddes
James is Principal Data Scientist at the Turing, leading the data science side of the Group. He is interested in understanding how new ideas in machine learning can be applied to real-world challenges. He is especially keen to find ways to reduce the surprising amount of manual “data wrangling” that all projects seem to need.
James has a PhD in physics from Chicago. He has experience as a data scientist in industry and government, in building analytical models that are transparent and understandable, and teaching data science.
Giovanni is a Data Scientist at The Alan Turing Institute. He did his PhD in Technology Management at the Digital Humanities Laboratory of the EPFL in Lausanne, working on methods for text mining and citation analysis of scholarly publications. He was for two years the operations manager of the Venice Time Machine, a large-scale digitisation and indexation project based at the Archives of Venice, and is cofounder of Odoma, a start-up offering customised machine learning techniques in the cultural heritage domain.
Giovanni is interested in how data science can contribute to the scientific endeavour, and to society at large. He is particularly passionate about the humanities and the study of the past through an interplay of quantitative and qualitative methods.
Prior to joining the Turing, Giovanni has been a researcher at the University of Leiden (Centre for Science and Technology Studies), the Leibniz Institute of European History in Mainz, and the University of Oxford. He studied computer science (BSc) and history (BA, MA) in Udine, Milan, Padua and Venice in Italy.
Evelina is a Data Scientist at The Alan Turing Institute. She is passionate about making data science understandable and accessible to everyone.
She originally started as a programmer but got interested in machine learning early on and did a mathematics PhD at the University of Cambridge. During her PhD, she worked on Bayesian models for unsupervised learning that integrate heterogeneous biomedical datasets. After that she worked in cancer research at the MRC Cancer unit in Cambridge, where she focused on helping biologists analyse genomic data.
Outside of academia, she is also an avid speaker at developer conferences and she was awarded the Microsoft MVP award for her work in the F# community.