Predictive graph analytics and propagation of information in networks

Leveraging graph optimisation tools to understand how news sentiment spreads in a financial network

Project status

Finished

Introduction

This project addresses the problem of predicting the behaviour of agents based on partial a-priori news sentiment and the network structure. This goal will be achieved through the use of novel methodologies from optimisation, clustering and portfolio analytics. Possible applications include improving news sentiment dissemination, assessing impact on trading behaviour, enhancing risk models, and detecting insider trading activities.

Explaining the science

Summary

This project is developing algorithms to gain an understanding of how news sentiment propagates through the network, from the news-affected companies (the minority class) to the remaining companies (the majority class). This approach is broadly applicable to instances where one has available a sparse signal (e.g., news sentiment for a subset of nodes) and would like to understand how the available signal measurements propagate through the network to the remaining nodes.

Technical description

This approach can be seen as an instance of the group synchronisation problem over Z_2 with anchor information, where one would like to estimate n unknown group elements from noisy pairwise ratios of group elements, subject to the constraint that a subset of the elements (denoted as the anchor set) is known a-priori. The empirical correlation matrix (or various denoised version) serves as the matrix of weighted pairwise group ratios, and one would like to find an assignment of ±1 to the n nodes of the signed graph, subject to the constraint that some node group elements are known (+1 for positive news, and −1 for negative news). In prior work, motivated by structural biology applications, the researchers have solved such instances via semidefinite programming (SDP) and quadratically constrained quadratic programming, and more recently via methods inspired from non-convex optimisation and fast SDP via a Burer-Monteiro approach.

On the clustering side, the researchers have recently developed algorithms for clustering of signed networks. Clustering is one of the most widely used techniques in data analysis, and aims to identify groups of nodes that exhibit similar features. Much of the existing literature has focused on unsigned networks, whose edges only have non-negative real weights. However, negative weights can be used to denote dissimilarity or distance between a pair of nodes in a network. In this case, the resulting network arising from the correlation of historical returns of the S&P 500 instruments displays both positive and negative values. To this end, the researchers rely on techniques they recently developed, that proposed new principled clustering algorithms that explicitly take into account the negative weights by redesigning the graph-based objective function to be optimised.

Node graph
This project's clustering algorithms pertain to networks that have both positive and negative edge weights. This figure represents a prototypical configuration (green, blue, and orange correspond to three different clusters). An objective function is optimised that promotes placing nodes connected by positive edges in the same cluster, while separating nodes connected by negative edges.
Node graph
Schematic representation of news propagation in a network of companies. On any given day, only a subset of the companies (a and b) display news sentiments. These can be positive or negative values (represented by red arrows pointing up or down, respectively). The available news sentiments are used to infer sentiments for the nodes in the rest of the network (c,d,e) by solving a specific optimisation problem.

References

M. Cucuringu, A. Pizzoferrato, News sentiment propagation in financial networks via constrained optimization, (in progress)

M. Cucuringu, A. Pizzoferrato, Y. van Gennip, An MBO scheme for clustering and semi-supervised clustering of signed networks, arXiv:1901.03091 (2019), https://arxiv.org/abs/1901.03091

M. Cucuringu, P. Davies, A. Glielmo, H. Tyagi, "SPONGE: A generalized eigenproblem for clustering signed networks", AISTATS 2019, https://arxiv.org/abs/1904.08575

Project aims

The project explores the Refinitiv News Analytics dataset, a machine-readable news feed that includes news items from tens of news media outlets, starting from 1996. Each firm-specific news provides several quantitative scores, including sentiment scores (indicating whether a story is positive, negative, or neutral), relevance (measuring how relevant a story is to a firm), and uniqueness scores (reflecting the news novelty).

The project aims to develop a methodology which can be broadly applicable to data sets that exhibit a low-dimensional structure, and where one would like to understand the evolution and dynamics of the network under certain shocks inflicted at various nodes in the network (eg, news can be seen as “shocks” administered to the system). This work will constitute a proof-of-principle technique for inference of dynamical changes in the behaviour and structure of time-dependent networks.

More technically, the researchers aim to assess the performance of news sentiment propagation against future systematic and idiosyncratic decomposition of returns, and explore the interplay with the clustering structure of signed networks arising from prices and news data, and in future work, from the Refinitiv Knowledge Graph.

Applications

This research project is developed in collaboration with Refinitiv, which provides news and financial data. From this partnership, it is possible to tailor the methodologies directly for the applications. The project's approach consists of sourcing daily data to propagate the news sentiment information in a suitably defined network of the Standard & Poor's 500 companies. On any given day, only about one third of instruments listed in the S&P500 index have at least one associated relevant piece of news. The main benefit of this project's framework is to provide a propagated news sentiment for the remaining set of instruments.

In addition, by analysing the spread of information it is possible to rank companies in terms of their influence as assessed by various network centrality measures. For example, companies which are better connected (eg, they have a higher node degree or play a key role in the network topology) could be more influential in terms of how sentiment information spreads over the network. Furthermore, studying unusual correlations between news sentiment and historical idiosyncratic returns could provide insights to insider trading monitoring.

Risk models provided by various vendors are popular amongst practitioners, and provide, for a given set of factors, the exposure of each instrument to that factor. In most cases, the model measures risk factors associated with three main types of components: industry risk, risk from exposure to different investment themes (eg., size, volatility, momentum, beta to the market) and company specific risk. Further enhancing such risk models with factors derived from news data and the Knowledge Graph is an interesting future direction to explore.

Organisers

Researchers and collaborators

Contact info

[email protected]

Collaborators