Optimising analysis of network graphs

Investigating how to optimise software that allows for the analysis of complex data networks on state-of-the-art processors

Project status

Finished

Introduction

Large-scale analysis of symbolic representations of networks, or graphs, is fundamental to research in a number of fields, from cyber security to biology. However, such analysis methods often require expensive supercomputing resources, that the methods are not always designed efficiently to utilise. Similarly, the resources themselves are often not optimised to run such analyses. This project aims to investigate how to optimise graph processing applications to achieve substantially improved performance on multi-core processing devices, such as those produced by Intel, to benefit UK industry and research.

 

Explaining the science

In graph theory, a graph is a symbolic representation of a network and of its connectivity as a set of linked nodes. Analysing these graphs can provide insights into complex relationships present between different types and feeds of real-world data, that are otherwise difficult to obtain.

Large-scale graph processing is increasingly becoming a fundamental component of a variety of data science workloads, from diverse domains including cyber security, machine learning, and computational biology. It is therefore of paramount importance to industry, academia, and several of the Turing’s research projects.

Increases in the data volumes being produced, and pressure to reduce time-to-solution for end-users, mean that the processing of large graphs frequently requires expensive supercomputer resources. Current performance of these workloads on existing supercomputer systems is frequently unsatisfactory, and likely future development trends in the design of supercomputer architectures will represent further challenges. Improvements in performance are therefore likely to come from advances in implementing analyses which more efficiently exploit the parallel structure of current and future supercomputer systems.

Project aims

This project will undertake research to determine whether novel hardware technologies can be utilised for improving the current state-of-the-art, and examine how software applications should be enhanced in order to best utilise these technologies. 

The project will take an end-to-end view of the entire analytics pipeline to obtain information on how best to improve processor and system design, and how existing and future graph applications should be implemented in order to optimally utilise current and future supercomputer architectures.

In particular, this project will collaborate with Intel to improve performance on their multi-core processing devices, such as the Intel Xeon Phi and the Intel Xeon architecture, as well as other emerging processing technologies. The utility of novel memory technologies, such as Intel 3D XPoint, will also be examined as a vehicle for further improving the capabilities of graph analytics.

The project will also assess the performance of the recently proposed GraphBLAS specification, which aims to define standard building blocks for graph algorithms. Additionally, the techniques for improving the programming models, and the performance of the underlying storage models, supporting these applications will also be examined.

Lastly, software technologies will also be developed in order to improve the productivity of researchers implementing graph processing applications.

Applications

All development work undertaken will be based on open standards, and outputs from the projects will, wherever possible, be released as open source. These outputs should help ensure that UK data science researchers remain at the forefront of the field, as well as enable UK industry to benefit from the resulting capability improvements which this research aims to deliver.

Organisers

Researchers and collaborators

Funders

Collaborators