We have never been more connected. The gadgets we carry in our pockets mean that we’re only a button-press away from people on the other side of the planet. At the same time, the rise of big data has resulted in a tsunami of information about the people we chat to, the places we visit, the purchases we make. Tools are increasingly needed to make sense of these networks, but current software lacks one crucial aspect: the ability to track how connections change over time.
A new analysis tool called Raphtory, developed with funding from The Alan Turing Institute, is the first to be built specifically for this purpose. By plotting how networks evolve (known as ‘dynamic graphs’), it promises to take data science into unmapped territory. The list of potential applications is mind-boggling, from tracking changes in the meanings of words, to spotting cryptocurrency fraud, monitoring toxic online communities and understanding urban transport patterns.
Mathematicians have been interested in networks, known in the community as ‘graphs’, since the 1700s, drawing up diagrams composed of points (‘nodes’) connected by lines (‘links’). The vast majority of research has so far focused on static graphs – graphs captured at a particular moment in time, or averaged over a certain timescale.
But while static graphs can provide useful information, they ignore an important dimension: time.
“In the real world, graphs are rarely static,” says Richard Clegg, a mathematician at the Turing’s partner university Queen Mary University of London (QMUL), who is co-leader of the Raphtory project.
Dynamic graphs incorporate information about the nodes and links at different points in time, allowing researchers to find out how the graph has changed over the past month or year, for instance, or after a new node or link was added.
Current systems for analysing graphs were built with static graphs in mind. This makes them cumbersome when dealing with the time element. If you want to analyse the graph at a new point in time, you have to reload the graph at that time and reanalyse the data. Similarly, if you want to update the data, you have to reload the entire dataset, not just the data of interest. This is slow and inefficient, especially with the huge datasets that are becoming the norm in data science.
Raphtory (its name is a play on the words ‘graph’ and ‘story’) addresses these problems by automatically updating the dynamic graph as new data comes in. Crucially, the software keeps a rolling record of any changes to the nodes and links, giving the user instant access to the graph’s entire history.
A moment in time
Raphtory is a project that’s been brewing for some time. “I’d been wanting to build this kind of software for years,” says Clegg. In 2016, Clegg moved to QMUL and made contact with Felix Cuadrado, a computer scientist there who researches analysis methods for large-scale dynamic graphs. Fortuitously, Cuadrado (who is now a Turing Fellow) had just been awarded a grant to develop exactly the kind of software that Clegg had in mind, so the two began a collaboration, working with PhD student Ben Steer to develop a prototype. In 2019, they obtained additional funding from the Turing’s ‘AI for science and government’ (ASG) programme and hired computer scientist and programmer Imane Hafnaoui to help them sculpt the software into an open-source resource that can be used by other researchers. (Building open-source infrastructure is a key goal of the Turing’s ‘Tools, practices and systems’ programme, which the project is also part of). Meanwhile, Steer has founded a spin-out company, Chorograph.
Raphtory works by splitting dynamic graphs over multiple networked computers, increasing the amount of memory for data storage and processing. When data is fed into Raphtory (this can either be previously stored data or a real-time stream), the algorithm analyses the data to figure out which section (‘partition’) of the graph needs to be updated, sending the updates to the computer that’s hosting the relevant node or link. This is trickier than it sounds.
“If you have two nodes on different computers, then any link between them travels between the two computers,” says Clegg. “You need to make sure that this link is never broken and is updated on both computers. That one thing took months to code.”
Once the data is in the system, the user is free to analyse the graph. And this is where the really exciting stuff begins: using Raphtory to solve real-world problems.
Dark side of the web
The internet offers the perfect testing ground for Raphtory. Take social networks. Every social network is essentially a dynamic graph, with users as the nodes, and interactions between them as the links.
Recent work at QMUL has focused on a social network called Gab that is similar in format to Twitter. Gab is known to attract users from the alt-right, and previous research has shown that it hosts a relatively high amount of hate speech, and users sometimes arrive after being banned from other social networks.
Gab presents a unique opportunity for studying how social networks grow, because it is possible to obtain datasets going back to the site’s launch in August 2016. Naomi Arnold, a PhD student at QMUL, is using Raphtory to analyse all of the interactions between users up until May 2018 (interactions are defined as a user replying to another user’s post, or reposting it).
Arnold’s analysis shows that the network experienced bursts of activity on certain days, coinciding with events of potential interest to the alt-right community, such as Donald Trump’s US election win in November 2016, and the ‘Unite the Right’ white supremacist rally in Charlottesville, Virginia in August 2017. Her analysis also reveals that Gab consists mostly of a small ‘elite’ of influencers writing to strangers, rather than interactions between friends.
One of the ultimate aims of this research is to understand how toxic communities might evolve within social networks. “We could use Raphtory on any social network to look for communities that are becoming more insular and interacting less with the rest of the network,” says Arnold. “This might be a sign of the community becoming more extreme in its views.” Combining this with analysis of the content being posted could provide an automatic way of flagging up any communities that are potentially problematic.
Staying in the darker realm of cyberspace, Raphtory also offers promise for tracking illegal cryptocurrency transactions. Here, the nodes of the dynamic graph are the cryptocurrency buyers and sellers, while the links are their transactions. By looking for unusual changes in the graph’s structure, it might be possible to spot fraud.
“Cryptocurrency criminals use a number of tactics to make it more difficult to trace them,” says Cuadrado, “such as using a ‘mixer’ node to combine different transactions into one pot, muddling up the data. We’ll be able to use graph analysis to look for these kinds of patterns.”
The Raphtory team has now started to work with the blockchain analysis company Chainalysis, who in November 2020 used similar techniques to recover more than US$1 billion in stolen bitcoin tied to the online black market Silk Road.
Let’s get physical
It’s not just in the virtual world that Raphtory could have impact. Transport networks, for example, are dynamic graphs. As we move around our towns and cities, we create a graph in which the locations are the nodes, and our journeys between them are the links.
To explore how Raphtory might be applied to transport data, Clegg and Cuadrado have teamed up with Susan Grant-Muller, a Turing Fellow and statistician at the Institute for Transport Studies, University of Leeds. They’ve begun to look at the effectiveness of campaigns designed to subtly influence citizen behaviour – a field of research known as ‘nudge theory’. One example is a six-month campaign run in Newcastle upon Tyne during 2018, which encouraged residents to download an app that rewarded them with points and prizes if they made greener transport choices by walking, cycling, or using public transport or electric cars instead of fossil-fuelled cars.
“Over 1,400 people downloaded the app,” says Clegg. “They agreed to have their location tracked as they moved around the city, and we’re using dynamic graph analysis to find out how the app incentives affected their behaviour.” The team is able to infer a user’s mode of transport from the time it takes them to travel between locations. Finding out what kinds of incentives led to more people walking or cycling, for instance, could help authorities to develop more effective public health initiatives.
There’s also another aspect of our day-to-day lives that is ready-made for dynamic graph analysis: language. In this case, the words are the nodes and the connections between them are the links. Two words might be linked in a graph if they appear close together in the same sentence, for instance. Visualising language in this way allows researchers to look for changes in the connections between words, and hence their meanings, over time.
To this end, the Raphtory team is working with Barbara McGillivray, a computational linguist and Turing Research Fellow based at the University of Cambridge, to analyse a dataset containing all of the words published on UK internet pages between 1996 and 2013. “One change we’re looking for is where words go from negative to positive meaning,” says Clegg, “such as with ‘bad’, ‘sick’ and ‘wicked’.”
This is a wonderful example of the relevance of dynamic graphs to real life. Now the team is hoping to foster a wider community of Raphtory users, with the aim of dreaming up even more inventive uses for the software.
“The more time I spend working on dynamic graphs, the more excited I am by their potential,” says Cuadrado. “We’re hoping that Raphtory will become the go-to tool for their analysis.”
Main image: Hanson Lu / Unsplash