The Alan Turing Institute

Data Study Group - December 2018

Bringing together top talent from data science, artificial intelligence, and wider fields, to analyse real-world challenges.

Learn more Add to Calendar 12/10/2018 10:00 AM 12/14/2018 08:00 PM Europe/London Data Study Group - December 2018 Location of the event
Monday 10 Dec 2018 - Friday 14 Dec 2018
Time: 10:00 - 20:00

Event type

Data Study Groups

Audience type

Cross-disciplinary
Free

Event series

Data Study Groups

About the event

We invite data practitioners, PhD students, postdocs and other early career researchers to apply for our next Data Study Group  which will be held Monday 10 December - Friday 14 December 2018 at The Alan Turing Institute in London.

What are Data Study Groups?

  • Intensive five day 'collaborative hackathons' hosted at the Turing, which bring together organisations from industry, government, and the third sector, with talented multi-disciplinary researchers from academia
  • Organisations act as Data Study Group 'Challenge Owners', provide real-world problems and data sets to be tackled by small groups of highly talented, carefully selected researchers
  • Researchers brainstorm and engineer data science solutions, presenting their work at the end of the week

Why apply?

The Turing Data Study Groups are popular and productive collaborative events and a fantastic opportunity to rapidly develop and test your data science skills with real-world data. The event also offers participants the chance to forge new networks for future research projects, and build links with The Alan Turing Institute – the UK’s national institute for data science and artificial intelligence.

It’s hard work, a crucible for innovation and excellent science, and a space to develop new friendships.

Reports from a previous Data Study Group are available on the Outcomes section of the December 2017 Data Study Group pages.

Challenges

Our challenges and data sets are provided by partner organisations for researchers to work on over the week. The organisations and challenges leading the Data Study Group this December are:

  • Defence Science and Technology Laboratory (Dstl) - Seeing the unseen: Interrogating cellular-level response to hazardous materials
  • Imperial College London Statistics Section, Los Alamos National Laboratory, and The Heilbronn Institute for Mathematical Research - Developing data science tools for improving enterprise cyber-security
  • MedImmune - Machine learning for enhanced understanding in cell culture bioprocess development
  • National Cyber Security Centre - Capturing the function of websites
  • NATS UK Air Traffic Services Provider - Next-generation trajectory prediction for air traffic control
  • PlayerLens - Player Pathways: Understanding career paths that deliver success for professional football players and clubs

Please see below for further details on each challenge

The skills that we think are particularly relevant to the challenges for this Data Study Group are listed under each challenge description below.  Please note, the lists are not exhaustive and we are open to creative interpretation of the challenges listed. Diversity of disciplines is encouraged, and we warmly invite applications from a range of academic backgrounds and specialisms.

Data Study Group in action April 2018

How to apply

Applicants are required to submit a summary of technical skills (with evidence), collaborative experience (with examples), and a CV.  We are also interested in applicants’ personal motivations for participating in the Data Study Group.

Application form

As the Data Study Group takes place over 5 days, we prefer applicants who can attend the full week. We can offer flexibility and will consider participants for 1 or more days.

Deadline for applications: Monday 22 October 2018 12:00 GMT

We will let you know the outcome of your application by 14 November 2018

We look forward to receiving your applications.

Expenses

The Alan Turing Institute will cover travel costs in alignment with our expenses policy. We will also provide accommodation for researchers not normally London-based. Accommodation for researchers who are from a London university or research institute may be available for those who travel from outside London to work. Expenses for international applicants is capped at £200, which includes any costs of visa. Lunch and dinner is provided for participants during the week. 

Challenge descriptions 

Defence Science and Technology Laboratory (Dstl)

Seeing the unseen: Interrogating cellular-level response to hazardous materials 

Understanding how the body responds to hazardous materials is the first step towards mitigating the potential harm they pose. The advent of modern, high-throughput sequencing technology such as RNAseq has provided new insights into the changes that occur in response to these substances within the basic units of life - individual cells.

This challenge is based around using modern analytical methods to identify variations in the cellular response at different time points within controlled experiments following exposure to a variety of dangerous substances (e.g. from plague, to toxins, hazardous chemicals). The aim is to identify a series of protein precursors or patterns that would be suitable for subsequent evaluation as biomarkers to aid diagnosis, triage and/or treatment in the event of an exposure to a hazardous material.

Useful skills: biostatistics, sparse regression, statistical genetics

Imperial College London Statistics SectionLos Alamos National Laboratory, and The Heilbronn Institute for Mathematical Research

Developing data science tools for improving enterprise cyber-security

There is increasing attention being directed from both government and industry towards the use of statistical, machine learning and broader data science techniques for improving cyber-security. This challenge aims to take a first step in providing a unified toolset for exploiting the variety of cyber-relevant data sources which are available. The challenge will be based around a unified data repository released by Los Alamos National Laboratory, comprising both network flow records and process-level Windows service logs collected on the same enterprise computer network over a three-month period.
 
There are three aspects that could potentially be tackled in this challenge: anomaly detection, data fusion, and visualisation. Can we detect unlabelled red team activity in the data? Can we effectively fuse the data sources to give a more coherent view? What visualisations might aid prioritisation of review of potential threats by analysts? We are most interested in seeing some creative approaches to visualisation and fusion.

Useful skills: data visualisation, anomaly detection, reinforcement learning, cyber security, event data, online learning

MedImmune

Machine learning for enhanced understanding in cell culture bioprocess development

Supplying breakthrough modern medicines requires increasingly sophisticated industrial manufacturing processes to cultivate and exploit animal cell cultures: understanding and controlling performance in these highly complex biological systems is tough, making effective process design difficult to achieve by classical biochemical engineers working alone.  

This challenge will focus on culture pH as the key input and investigate feature rich, time series data sets from prior industrial reactor runs with the aim of uncovering new insights about pH effects on the kinetics of cell culture growth, metabolism, product quality and productivity.

Useful skills: time series prediction, statistics of experiments, probabilistic and stochastic modelling

National Cyber Security Centre

Capturing the function of websites

The web has become a key part of everyday life for individuals and businesses, enabling information retrieval, commerce, and collaboration. To better facilitate website recommendation, semantic search, discovery and security, the NCSC will ask: can we characterise the function of a website using hyperlinks and text information, in a way that facilitates large-scale machine learning? This challenge aims to explore this question through state of the art network embedding approaches from the fields of machine learning, deep learning and graph theory.
 
Useful skills: neural networks, network embeddings, representation learning (on networks and texts), graph algorithms

NATS - UK Air Traffic Services Provider

Next-generation trajectory prediction for air traffic control

Predictions of aircraft movements underpin many key air traffic control systems, and this challenge has the potential to guide future developments towards improved capacity, safety, and resilience for air traffic services in the UK.

This challenge will investigate how modern data science techniques can be applied to a wide range of historical aircraft data in order to accurately predict aircraft trajectories.

Useful skills: filtering and tracking algorithms, time series prediction, probabilistic prediction

PlayerLens

Player Pathways: Understanding career paths that deliver success for professional football players and clubs

PlayerLens, in collaboration with Opta, are offering access to a unique database of detailed career records, covering a substantial fraction of all professional players that ever existed, for an empirical study of career paths modelling and investigation of career determinants with data science and AI techniques.

Useful skills: event data, sports modelling, career modelling

Find out more

How to get involved as a researcher

How to write a great Data Study Group application

Queries can be directed to Data Study Group