Data Study Group December 2017

Data Study Groups – Introduction

Get involved as a researcher

Get involved as an industry partner

The Turing Data Study Group brings together researchers, data practitioners, and industry representatives from leading organisations to work together on data science challenges posed by partner organisations in multi-disciplinary teams, enabling researchers to build collaborations and work with real-world industry datasets.

The Data Study Group taking place from 11-15 December will have the following science challenges from partner organisations:

  • Cabinet Office: Can data science help identify potential drivers of extremism?
  • Cabinet Office: Can the news help us understand global instability?
  • Inmarsat: Can we use geospatial time-series analyses to predict the demand for a global satellite communications network?
  • Dstl: Can machine learning be used to identify code vulnerabilities?
  • Codecheck: How do our food choices affect climate change?

View the detailed challenge descriptions

You can read about the experience of researchers that took part in a previous Turing Data Study Group and follow their progress through our social media story.

If you have any queries, email Data Study Group for more information.

Science challenges in detail

Cabinet Office: Can data science help identify potential drivers of extremism?

Countering violent extremism is a major priority for the UK government. Can we utilise data to provide a more detailed picture of an individual’s engagement with extremism in order to identify both the risk factors of radicalisation and potential points for intervention? How might we apply this in real-world scenarios?

The Cabinet Office will provide a range of public datasets and an optionally accessible digest of materials relating to extremism.


Cabinet Office: Can the news help us understand global instability?

Government aims to provide the most accurate information about the world around us, to enable the most effective decisions to be made, and to protect UK citizens globally. But we have sometimes struggled to spot crises unfolding, or assess the risk of instability. As well as the large amounts of economic, social and polling data now increasingly published globally, there has been a major growth in news outlets – all made available online.

How can we make sense of all this data, to better spot risks of global instability? Could there be patterns preceding a crisis? What can we learn from the quantitative data which is already available and how, if at all, can analysing the news, and the language within it, help?

As a feasibility test, this challenge will explore English-language news in Hong Kong, alongside other available data and example key events.


Inmarsat: Can we use geospatial time-series analyses to predict the demand for a global satellite communications network?

Inmarsat is an internet service provider with a satellite network for areas of little coverage, including oceans, deserts and aeroplanes – a growing market. There is a need for efficient predictive models to determine the demand. Significant spatial autocorrelations are expected.

How can the data, covering a 4-dimensional grid over geospatial coordinates, time and channel (voice call, streaming, IP traffic, ISDN), be used for predictive modelling? What are the potential applications of the data?


Dstl: Can machine learning be used to identify code vulnerabilities?

Static analysis is a common way to check for vulnerabilities in code. Unfortunately, while current tools for static analysis have a relatively high recall, they return a large number of false positives, leading to lower use.

This challenge has two elements: firstly, can you train a machine learning system to recognise code vulnerabilities more sensitively? This is an unusual supervised learning problem, as source code is highly structured – could graph-based techniques or other approaches perform better than a simplistic bag-of-words model?

Secondly, can you create a database of high-quality real-world examples of vulnerable code, applying natural language processing techniques to the commit messages and code comments of publicly-available code repositories, such as Github? Together with the first part, this will allow robust and effective analysis, leading to more resilient software systems.


Codecheck: How do our food choices affect climate change?

Codecheck, the biggest online product guide within the German-speaking area, provides detailed information on ingredients found in food and cosmetics through scanning a product’s barcode.

Using Codecheck’s data, is it possible to develop a model to help estimate a carbon footprint for food? Approaches could be using similar products with a pre-existing carbon footprint calculation and widening the knowledge of the amount of ingredients in the product by estimating the composition of the product using similar formulas, given the order of the ingredients and nutritional values.