Introduction
Stage 1: The precursor stage (part-time)
- The precursor stage will last one week in the run up to the 'event stage' (online and part-time 5 - 9 December).
- The maximum time commitment is 3-4 hours a day.
- Online workshops / presentations / team-building in order to prepare for the ‘event stage'.
Stage 2: The event stage (full-time)
- This in-person event will take place at The Alan Turing Institute in London, UK.
- The 'event stage' will run over one week (12 - 16 December).
- We expect participants to spend around 9–10 hours per day on Tuesday, Wednesday and Thursday working on the challenges, please note that it is not uncommon for participants to work 12 hours days during the week should they wish. The event will finish 5pm on Friday 16 December.
Challenges
The challenges are:
Advanced Manufacturing Research Centre (AMRC)
Data augmentation and synthetic data generation for low-frequency and sparse data problems
AMRC is a network of world-leading research and innovation centers working with manufacturing companies of any size from around the globe.
In this DSG challenge, we aim to assess the usefulness of data augmentation and synthetic data generation in detecting, predicting, and attributing costly faults in the production process of manufacturing companies. Any parts rejected in the supply chain are costly for the manufacturer, yet the reason for rejection is often difficult to trace and improve upon. The reason for this difficulty is the lack of data available on the manufactured part – both due to it undergoing several processes at different sites and the lack of data available for collection, especially in digital form, in the manufacturing world. In this DSG, we will gauge the feasibility of data augmentation and synthetic data generation in this low data availability setting.
Outcomes from the project will be assessed for their potential to guide AMRC in improving their data gathering and analysis pipelines and ultimately in driving improvements to machining processes.
The Centre for Environment, Fisheries and Aquaculture (CEFAS)
Counting sea pens from ocean floor video footage
Sea pens (marine coelenterates forming a feather-shaped colony with a horny or calcareous skeleton) are an indicator of the health of muddy ecosystems. CEFAS has a large collection of video footage of the sea floor, collected over the period 2014-2021, that is used to quantify sea pens. This is a challenging task as the equipment used to collect the footage has evolved over time. There are varying levels of visibility due to the different lighting systems employed, which makes it difficult to identify sea pens under certain conditions.
The objective of this challenge is to investigate the potential of computer vision and deep learning to undertake a large-scale, automated study of the available data, which has been prepared and labelled by CEFAS in advance of the DSG. Some of the questions to be explored are:
• Can sea pens be reliably localized and classified from the available video footage?
• Is it possible to devise a methodology that counts each sea pen only once?
• Can the video footage be enhanced and standardized to correct for variable lighting, camera and other technical conditions?
The outcomes of this DSG challenge could have an important impact on marine environment research. In a preliminary study, CEFAS performed an expert manual detection and labelling of a subset of videos. A number of machine learning algorithms which classify sea pens with varying accuracy were applied to the data. A key challenge identified was that of standardizing the footage collected from different capture systems.
Department for Transport (DfT)
Understanding behaviour of transport user
DfT is always looking to improve transport services for customers. It conducts a variety of activities to gain an understanding of the behaviour of transport users. However, surveys are usually looked at in isolation. Can we gain knowledge from user behaviour by analysis of large-scale surveys such as the National Travel Survey?
The dataset for this challenge are records from the National Travel Survey. With this data we aim to answer questions and understand patterns related to: user transport patterns, pain points while travelling, reasons for journeys, geographic differences in travel patterns and more.
By understanding user transport habits, DfT will be able to make service provision decisions that can improve the travel experience while lowering costs.
University College London (UCL)
DeLTA: Deep Learning Techniques for noise Annoyance detection
UCL and NTU Singapore have now compiled two of the largest datasets of urban soundscape recordings featuring annoyance ratings and sound source labels. This DSG will invite participants to develop a system to identify sound sources from audio recordings, which can feed a model trained to predict annoyance ratings. This will enable a smart sensor system which can map urban soundscapes based on human perception, rather than merely documenting noise levels. Noise sensors which can map and flag high annoyance levels will contribute to an improved knowledge of the impacts of noise annoyance on health and well-being in urban spaces.
Chronic high noise annoyance impacts 22 million people in Europe alone, with a broad range of public health outcomes, leading to increased risks of cardiovascular and metabolic disorders. Urban soundscapes are complex environments, with overlapping sound sources each competing for our attention against an ever-shifting background. Although “urban noise” often focusses only on traffic noise, all sounds in a city contribute to the production or restoration of stress. To capture these effects requires the identification of different sound sources in a complex environment in an automated way. A growing challenge for the field of urban noise and soundscapes has been to provide a more nuanced and correct view of the prevalence of annoying and impactful sounds across cities. This challenge will attempt to apply state-of-the-art AI sound source identification models to these complex environments to create a sound source aware annoyance prediction model to be deployed in smart city systems.
Useful skills: Machine hearing, Environmental Sound Recognition, CNNs, TCNs, and LSTMs, music sentiment analysis, data augmentation, signal processing, deep learning, predictive/supervised modelling, remote sensor experience, smart city applications, broad data science and programming skills with an interest in audio
About the event
What are Data Study Groups?
- Intensive 'collaborative hackathons' hosted at the Turing, which bring together organisations from industry, government, and the third sector, with talented multi-disciplinary researchers from academia.
- Organisations act as Data Study Group 'Challenge Owners', providing real-world problems and datasets to be tackled by small groups of highly talented, carefully selected researchers.
- Researchers brainstorm and engineer data science solutions, presenting their work at the end of the week.
How to apply
Applications are now closed. Other Data Study Group events, will be placed on this website in the new year.
Why apply?
The Turing's Data Study Groups are popular and productive collaborative events and a fantastic opportunity to rapidly develop and test your data science skills with real-world data. The event also offers participants the chance to forge new networks for future research projects and build links with The Alan Turing Institute – the UK’s national institute for data science and artificial intelligence.
It’s hard work, a crucible for innovation and a space to develop new ways of thinking.
Read reports from previous Data Study Groups to see challenges and outcomes.
FAQs
What if I am already part of the Turing community?
If you are employed by one of the Institute’s 13 university partners, please contact your University Liaison Manager – list available here – to make them aware of your application. They can provide support, answer questions and involve you as part of the Turing community at your university from now on.
If you are employed at a university that received a Turing Network Development Award, please contact your Award lead – list available here (scroll to the bottom of the page) – to make them aware of your application.
More FAQs for Data Study Group applicants.
Find out more
Learn more about being a DSG participant including FAQs
How to write a great Data Study Group application
Queries can be directed to the Data Study Group Team