Researchers and practitioners face a common need for high quality tools, practices, methodologies, platforms and systems.
Many domains can benefit from the deployment of cutting-edge algorithms and approaches, but these cannot be effectively applied unless realised as usable software libraries, reproducible analyses and workflows, or high performance computational environments.
We avoid the temptation to create tools we imagine ought in principle to be useful, but are then not used, our projects always including joint activity with other areas. As a community (informally known as ‘Hut23’) we are committed to professional delivery of impactful research, as well as our own research interests. Often, we contribute our skills in research software engineering and data science to projects in other programmes.
Visit the Group’s lab pages which include information on their regular lunchtime tech talks and projects.
Research data scientists
Our Research Data Scientists have expertise in computational statistics, inference, and machine learning, as well as mathematical and computational modelling of complex systems, knowledge representation, and operations research.
They apply their skills to clean, wrangle and analyse data, and to deploy analyses developed by Turing researchers on our high-performance computing platforms.
Research software engineers
Our Research Software Engineers collaborate with our researchers to build and maintain software that implements and supports the research activities.
Research Software Engineers work with researchers to create software requirements, develop code, document and explain the software, and support the release and maintenance of the software through open-source channels and publication in research journals.
Artificial intelligence for data analytics (AIDA)
In this project, researchers at the Institute are drawing on new advances in artificial intelligence and machine learning to address data wrangling issues; they aim to develop systems that help to automate each stage of the data analytics process.
Artificial intelligence for data analytics (AIDA) - Datadiff
Datadiff is an AIDA sub-project which aims to automate the process of reconciling inconsistencies between pairs of tabular datasets whose information content is supposed to be similar.
When a dataset is batched into multiple tables, for instance due to periodic data collection, it is not uncommon to find discrepancies in format or structure between the different batches, such as the renaming and/or reordering of columns, changes in units of measurement or encodings, introduction of new columns, etc. Such differences impose an overhead on any consumer of the data wishing to join the separate pieces into a consistent whole.
Typically this process involves human intervention: people are good at resolving issues of this kind by spotting patterns and making educated guesses. Datadiff is an attempt to solve the problem algorithmically.
This is a joint venture project with University College London, Imperial College and industry partners. Code Blue is a science gateway which scales to enable users to run fluid dynamic simulations in the cloud or their university computing infrastructure.
Working with the children’s charity Coram, this project is to explore how data collected on children in care can be modelled and visualised to help inform the decisions of local authorities.
Evaluating homomorphic encryption (SHEEP)
SHEEP is a homomorphic encryption evaluation platform; homomorphic encryption allows mathematical operations to be performed on encrypted data, such that the result when decrypted is what would have been obtained from addition or multiplication of the plaintexts. The goal of the project is to provide an accessible platform for testing the features and performance of the available homomorphic encryption libraries.
This is a short project led by Institute researcher Tomas Petricek. The Gamma project empowers anyone to examine data, learn to question the legitimacy of its data sources, and appreciate the context in which numbers are presented. The code behind the project is open source.
This project with the National Cyber Security Centre will attempt to identify websites that are provided “unofficially” by government and are not under a well-known top-level domain.
Proof-driven querying (PDQ)
PDQ is a Java library developed by researchers at the University of Oxford for generating database query plans over semantically-interconnected data sources with diverse access interfaces.
A key target application is to the problem of making NHS data accessible to data scientists while respecting constraints imposed by privacy, integrity and efficiency. Our project aims to bring this goal closer by refining and extending the library’s query execution functionality.
Safety of offshore floating facilities
This project is a collaboration with the Australian Research Council Industrial Transformation Research Hub for Offshore Floating Facilities. Solitons are large, non-linear waves that can be generated offshore by certain conditions involving tidal forces and the shape of a continental shelf. They can have significant impact on offshore facilities for the oil and gas industries. The goal of this project is to provide a probabilistic model for soliton formation, producing a distribution of predicted soliton amplitudes, in order to inform decision making.
A robust and well-tested set of software packages will enable these output distributions to be calculated in a timely manner, allow new measurements to be incorporated into the modelling and visualise the modelling results via an intuitive dashboard.
Scalable topological data analysis
The goal of this project is to rewrite existing topological data analysis code created by a Turing PhD student to firstly run massively parallel on an effectively unlimited number of CPUs with near linear scaling, and secondly to integrate into at least one public software package.
Security in the cloud
This project is a collaboration with Imperial College and aims to establish a cloud computing model for sensitive data, where the cloud provider itself can remain untrusted by the owner of the data. It does this by leveraging Intel's Secure Guard Extensions (SGX) which are available in recent generations of Intel CPU, and allow certain computations to be carried out in a trusted "enclave" of an otherwise untrustworthy system.
An outcome of the project will be an extension to the Apache Spark big-data platform to allow this technology to be used for distributed computation, as well as a significant amount of the supporting operating-system-level infrastructure.
The Group is offering a course on research software engineering with Python for Turing Doctoral Students and a limited number of Turing researchers.
Working with us
Turing Fellows and students looking for help from the Group should email the relevant Research Engineering Challenge Lead.
REG Challenge Leads