Introduction
Stage 1: Precursor Stage (part-time, online)
- The Precursor Stage runs from Monday 2 – Friday 6 September 2024, in the lead up to the Event Stage.
- The maximum time commitment is 2.5 hours a day.
- This includes online workshops, presentations and team building which will prepare participants for the Event Stage.
Stage 2: Event Stage (full-time, in person)
- The Event Stage runs from Monday 9 - Friday 13 September 2024, and is held at The Alan Turing Institute (British Library, 96 Euston Road, London NW1 2DB).
- Group work begins and continues throughout.
Applicants should be able to commit to the duration of the event. The Alan Turing Institute is committed to supporting individual circumstances, please do not hesitate to email [email protected] to discuss any reasonable adjustments.
Challenges
British Geological Survey
Identifying potential for carbon capture and storage in rock.
In 2018, the UK Government set a target of achieving carbon neutrality, or ‘net zero’, by 2050 (2045 in Scotland). This will require a significant reduction of existing emissions and the removal of the remaining positive balance from the atmosphere.
Achieving NetZero requires subsurface technologies to extract geothermal heat and for storing waste products, like CO2. Drilling new boreholes to test the suitability of geology for these projects is enormously expensive, so it is important to extract information from previously collected samples.
A major British Geological Survey (BGS) research objective is capture of new data from existing archives to support new subsurface uses. BGS holds 500000 thin sections, thin slivers of rock for microscopic investigation, collected for past resource exploration. These could provide data for new subsurface uses but manually interpreting them is hugely labour intensive. BGS need to automate processes for quantifying the properties of large volumes of rock.
This DSG challenge will explore how to capture quantitative data from images of thin sections to extract porosity, permeability and clast ("grains") mineralogy data to inform Carbon Capture and Storage (CCS) and geothermal energy extraction projects.
Transport for London
Identifying physical assets on the London Underground from the point cloud.
Transport for London (TfL) allocates £60-80 million annually for maintenance and renewal planning, which necessitates detailed knowledge of current track assets. This project aims to use point cloud data from track trolley scans to identify and classify various track features, including:
- Installed sleeper and rail types
- Sleeper spacing
- Location of track-side equipment (lubricators, point machines, signalling equipment)
- Location of drainage systems
- Accurate positioning of features along the rail (expansion switches, joints, welds)
- Conductor rail (3rd and 4th rail) type, position, and endpoints
Data is collected by an Amberg trolley equipped with a spinning laser, which measures distances from a fixed point above the track centerline. The data covers 750 km of track, representing about 75% of the London Underground network.
The cylindrical point-cloud data is converted into depth-maps. These depth-maps, encoded as flat tables, provide a 3D surface representation of the track environment, allowing for easy visual identification of track objects based on their surface shapes.
This challenge aims to convert this understanding into an automated system capable of making object detection and identification decisions.
UCL CHIMERA
Morbidity prediction using preoperative Cardiopulmonary Exercise Test results.
The Cardiopulmonary Exercise Test (CPET) is a procedure used to determine a person's physiological response to exercise.
In hospitals, CPET is commonly used before an operation to determine whether a patient is physically fit enough to undergo surgery and to ensure that anticipated benefits will outweigh the risk of surgery.
University College London Hospital are sharing an extensive, anonymised dataset, which includes, high frequency CPET exercise test data, preoperative physiological markers, and post-operative morbidity outcomes of patients that have undergone a cystectomy (removal of the bladder). Cystectomy is a serious operation, and some patients develop complications during recovery.
Being able to identify patients most at risk of complications will support more effective treatment post surgery.
Participants are invited to apply modern machine learning techniques to build predictive models of morbidity, specifically, respiratory complications, cardiovascular complications and post-surgical infections.
Useful skills: predictive modelling, machine learning, deep learning, regression, Bayesian modelling, time-series data processing, project management.
The Alan Turing Institute and Partners
Data-driven approaches to understand AI competencies
The UK Government, through its National AI Strategy, highlight AI skills and training as a core enabler to the effective uptake of AI, and to strengthen the UK’s position as an AI and science superpower over the coming decade.
The Alan Turing Institute works closely with several industry, professional and governmental bodies in the skills ecosystem, including Innovate UK BridgeAI . This project is to support employers and employees to understand the competencies and upskilling routes required to enable for safe and ethical AI adoption. This will help develop a pipeline of higher-skilled talent in AI. As part of the Bridge AI programme we would like to investigate:
Can data science techniques (e.g. NLP) be used to better understand how AI skills and competencies are described across national curriculum and professional standards, and used in recruitment for data/AI roles?
This DSG challenge will make use of UK labour market data including career pathways and supply-demand gaps, job description data and definitions of AI skills and competencies (e.g. AI Skills for Business Competency Framework, Institute for Apprenticeships and Technical Education occupational standards, and Alliance for Data Science Professionals professional standards).
It will explore the feasibility of these approaches to improve consistency and quality of standards and job descriptions, and the future implications of AI skills for the labour market. We want to understand how the UK’s education system can adapt to continue to deliver the training needed by the current and future workforce.
About the event
What are Data Study Groups?
- These are intensive 'collaborative hackathons' hosted at The Alan Turing Institute (or online), which bring together organisations from industry, government and the third sector, with talented multi-disciplinary researchers from academia.
- Organisations act as Data Study Group 'Challenge Owners', providing real-world problems and datasets to be tackled by small groups of highly talented, carefully selected researchers.
- Researchers brainstorm and engineer data science solutions, presenting their work at the end of the event.
Read reports from previous Data Study Groups to see challenges and outcomes.
How to apply
Application is now closed.
FAQs
What if I am already part of the Turing community?
If you are employed at one of the universities in The Alan Turing Institute’s Turing University Network (TUN), please contact your Turing Liaison to make them aware of your application. Once contacted, they can provide support, answer questions and involve you as part of the Turing community at your university from now on.
More FAQs for Data Study Group applicants.