What are Data Study Groups?
- Intensive five day 'collaborative hackathons' hosted at the Turing, which bring together organisations from industry, government, and the third sector, with talented multi-disciplinary researchers from academia
- Organisations act as Data Study Group 'Challenge Owners', provide real-world problems and data sets to be tackled by small groups of highly talented, carefully selected researchers
- Researchers brainstorm and engineer data science solutions, presenting their work at the end of the week
About the event
The Turing Data Study Groups are popular and productive collaborative events and a fantastic opportunity to rapidly develop and test your data science skills with real-world data. The event also offers participants the chance to forge new networks for future research projects, and build links with The Alan Turing Institute – the UK’s national institute for data science and artificial intelligence.
It’s hard work, a crucible for innovation and excellent science, and a space to develop new friendships.
Reports from a previous Data Study Group are available on the Outcomes section of the December 2017 Data Study Group pages.
Our challenges and data sets are provided by partner organisations for researchers to work on over the week. The organisations and challenges leading the Data Study Group this December are:
- Defence Science and Technology Laboratory (Dstl) - Seeing the unseen: Interrogating cellular-level response to hazardous materials
- Imperial College London Statistics Section, Los Alamos National Laboratory, and The Heilbronn Institute for Mathematical Research - Developing data science tools for improving enterprise cyber-security
- AstraZeneca - Machine learning for enhanced understanding in cell culture bioprocess development
- National Cyber Security Centre - Capturing the function of websites
- NATS UK Air Traffic Services Provider - Next-generation trajectory prediction for air traffic control
- PlayerLens - Player Pathways: Understanding career paths that deliver success for professional football players and clubs
The skills that we think are particularly relevant to the challenges for this Data Study Group are listed under each challenge description below. Please note, the lists are not exhaustive and we are open to creative interpretation of the challenges listed. Diversity of disciplines is encouraged, and we warmly invite applications from a range of academic backgrounds and specialisms.
How to apply
Applications for this Data Study Group are closed.
Applicants will be contacted regarding the outcome of their application by 14 November 2018.
If you have questions about your application please contact [email protected].
The Alan Turing Institute will cover travel costs in alignment with our expenses policy. We will also provide accommodation for researchers not normally London-based. Accommodation for researchers who are from a London university or research institute may be available for those who travel from outside London to work. Expenses for international applicants is capped at £200, which includes any costs of visa. Lunch and dinner is provided for participants during the week.
Seeing the unseen: Interrogating cellular-level response to hazardous materials
Understanding how the body responds to hazardous materials is the first step towards mitigating the potential harm they pose. The advent of modern, high-throughput sequencing technology such as RNAseq has provided new insights into the changes that occur in response to these substances within the basic units of life - individual cells.
This challenge is based around using modern analytical methods to identify variations in the cellular response at different time points within controlled experiments following exposure to a variety of dangerous substances (e.g. from plague, to toxins, hazardous chemicals). The aim is to identify a series of protein precursors or patterns that would be suitable for subsequent evaluation as biomarkers to aid diagnosis, triage and/or treatment in the event of an exposure to a hazardous material.
Useful skills: biostatistics, sparse regression, statistical genetics
Developing data science tools for improving enterprise cyber-security
There is increasing attention being directed from both government and industry towards the use of statistical, machine learning and broader data science techniques for improving cyber-security. This challenge aims to take a first step in providing a unified toolset for exploiting the variety of cyber-relevant data sources which are available. The challenge will be based around a unified data repository released by Los Alamos National Laboratory, comprising both network flow records and process-level Windows service logs collected on the same enterprise computer network over a three-month period.
There are three aspects that could potentially be tackled in this challenge: anomaly detection, data fusion, and visualisation. Can we detect unlabelled red team activity in the data? Can we effectively fuse the data sources to give a more coherent view? What visualisations might aid prioritisation of review of potential threats by analysts? We are most interested in seeing some creative approaches to visualisation and fusion.
Useful skills: data visualisation, anomaly detection, reinforcement learning, cyber security, event data, online learning
Machine learning for enhanced understanding in cell culture bioprocess development
Supplying breakthrough modern medicines requires increasingly sophisticated industrial manufacturing processes to cultivate and exploit animal cell cultures: understanding and controlling performance in these highly complex biological systems is tough, making effective process design difficult to achieve by classical biochemical engineers working alone.
This challenge will focus on culture pH as the key input and investigate feature rich, time series data sets from prior industrial reactor runs with the aim of uncovering new insights about pH effects on the kinetics of cell culture growth, metabolism, product quality and productivity.
Useful skills: time series prediction, statistics of experiments, probabilistic and stochastic modelling
Capturing the function of websites
The web has become a key part of everyday life for individuals and businesses, enabling information retrieval, commerce, and collaboration. To better facilitate website recommendation, semantic search, discovery and security, the NCSC will ask: can we characterise the function of a website using hyperlinks and text information, in a way that facilitates large-scale machine learning? This challenge aims to explore this question through state of the art network embedding approaches from the fields of machine learning, deep learning and graph theory.
Useful skills: neural networks, network embeddings, representation learning (on networks and texts), graph algorithms
NATS - UK Air Traffic Services Provider
Next-generation trajectory prediction for air traffic control
Predictions of aircraft movements underpin many key air traffic control systems, and this challenge has the potential to guide future developments towards improved capacity, safety, and resilience for air traffic services in the UK.
This challenge will investigate how modern data science techniques can be applied to a wide range of historical aircraft data in order to accurately predict aircraft trajectories.
Useful skills: filtering and tracking algorithms, time series prediction, probabilistic prediction
Player Pathways: Understanding career paths that deliver success for professional football players and clubs
PlayerLens, in collaboration with Opta, are offering access to a unique database of detailed career records, covering a substantial fraction of all professional players that ever existed, for an empirical study of career paths modelling and investigation of career determinants with data science and AI techniques.
Useful skills: event data, sports modelling, career modelling
Find out more
Queries can be directed to Data Study Group