Data science at scale

Building upon advances in high-performance computer architectures, through algorithm-architecture co-design, with applications including health and life science – this programme has now ended

Status

Finished

This programme ran between 2016–2021.

Advances in high-performance computer architectures, and the way algorithms can take advantage of them, have been transformative for a variety of data science tasks.

This scientific programme at the Turing, in partnership with Intel, built upon these successes through co-design of algorithms and computer architectures, a range of applications, and by training a new generation of data scientists with the computational skills required to solve the major data analysis tasks of the future.

Programme challenges

Researchers from both the Turing and Intel worked collaboratively under the shared goal to shape the future of computation for data science. This involved the following challenges and aims.


Smaller pebbles are being collaboratively stacked with larger ones to represent the principle of co-design

Algorithm-architecture co-design

As data science continues to grow as an industry and research sector, data-driven algorithms such as those required by deep learning – multi-level networks that gradually identify things at higher levels of detail – take up an increasing amount of valuable time and energy in data centres. This provokes a need to rethink how the technical challenges caused by this emerging new science are managed.

Hardware needs to be designed to suit the needs of data science algorithms, and algorithms need to similarly be designed to suit the capabilities of the hardware.

 


Data science students working together at a table

Training a new generation

The significant industry, government, and academic demand for data science skills creates a supply problem, with the UK facing a major skills gap which could inhibit the anticipated potential of data science and AI for our economy and society.

As well as conducting research, this programme’s partnership is training a new generation of data scientists through the Turing’s doctoral programme, ensuring students are equipped with the latest data science techniques, tools, and methodologies.

 


Computer servers

Improving hardware

Intel has dedicated a hardware architecture team at the Institute’s facilities in the University of Edinburgh.

The programme aims to dramatically increase the speed and efficiency of data-driven computing tasks and provide Intel with the tools to build the next generation of computer processors and high-performance systems.

Impact

Two people next to server racks

Impact story: Co-designing computing with Intel

In a high-performance computing (HPC) environment, such as a data centre with hundreds or thousands of interconnected computers, well-designed algorithms and architectures allow huge data analysis tasks to be performed. For example, classifying millions of images of tissue samples to identify whether they contain anomalous features that should be examined by a doctor.

While these high-performance systems operate well for some computing needs, they often run at less than half their full capacity for many data science and machine learning tasks. Researchers at The Alan Turing Institute have been working in collaboration with Intel to co-design better architectures for their HPC systems. The collaboration has looked at how to improve communication between multiple machines that are sharing the workload of massive analyses, as well as how to rethink the formatting of the data used in HPC, to improve performance on data science and machine learning problems.

The output of the work is not only helping Intel improve their products and services, but also enabling data scientists to manage and analyse massive datasets with greater efficiency, in a range of machine learning applications.

Read the full impact story here.

 

Labelled image of cells

Collaboration between the programme and the NHS

An announcement was made about a collaboration between the data science at scale programme and the University of Warwick, Intel, and University Hospitals Coventry & Warwickshire NHS Trust (UHCW). The collaboration involves using ground-breaking artificial intelligence techniques to detect and classify cancer cells more efficiently and accurately.

Scientists at the University of Warwick’s Tissue Image Analytics (TIA) Laboratory – led by Professor Nasir Rajpoot from the Department of Computer Science – are creating a large, digital repository of a variety of tumour and immune cells found in thousands of human tissue samples, and are developing algorithms to recognize these cells automatically.

Professor David Snead, clinical lead for cellular pathology and director of the UHCW Centre of Excellence, comments that ‘the successful adoption of these tools will stimulate better organisation of services, gains in efficiency, and above all, better care for patients, especially those with cancer.”

Read the ZDNet article about the announcement – “NHS taps artificial intelligence to crack cancer detection”

Funders