Analysing noisy data streams

Developing accurate, efficient, and robust statistical methodologies for analysing complex and noisy streams of data

Introduction

Recent years have seen an increase in the range of available real time financial market datasets, from online transaction records to high frequency limit order books (orders to buy or sell a stock at a specific or better price). Extracting information from these financial data streams using machine learning techniques is challenging due to their low signal to noise ratio and complex multi-modality, which can cause noise in the data to be misinterpreted as signal leading to potential financial loss, and even financial crises. This project aims to develop statistical methodologies that will improve the predictive power and robustness of analysis of such noisy data.

Explaining the science

Today most financial market records and observations can be captured electronically and stored in an unending stream of data. Whilst in principle this provides a broader range of market related information to investors, traditional statistical methods have very limited capacity in extracting insights from such large, unstructured data streams.

However, following successes in speech recognition and computer vision, machine learning offers potential solutions to this problem. Emerging research is being conducted in the area looking at applying machine learning methods to the prediction of future stock markets. Challenges lie in the variability of market data and its low signal to noise ratio. This project will approach these challenges by combining modern machine learning and statistical methods with the path signature technique.

See the related project ‘Capturing complex data streams’ for more information about path signatures. 

The path signature provides a unified way to deal with mixed frequency or partially observed data, and can lead to a significant reduction in the dimensions of data problems. Path signatures can then be combined with machine learning methods, e.g. deep learning or Gaussian processes, to improve the accuracy and efficiency of financial data forecasting.

Our previous research showed that the path signature feature can be used to capture essential information, such as the profit and loss of unknown trading strategies, from high frequency data, without causing dimensionality issues.

Project aims

The ultimate goal of this project is to develop novel technologies which exploit path signature techniques (as detailed in ‘Explaining the science’), machine learning, Bayesian inference (which provides a way of combining new evidence with prior beliefs), financial mathematics, and specific domain knowledge. The methods and tools produced aim to improve the predictive power and robustness of financial data analysis.

Additionally, we will also incorporate path signature techniques with sentiment analysis techniques used in natural language processing, to provide a more effective way to extract useful information from financial news and social media such as Twitter.

The project is being conducted with close relations to the financial industry.

Applications

The techniques and technologies produced have the potential to significantly benefit the financial industry. However, whilst the focus of this project is on the problem of financial data forecasting, by conducting this research and producing these technologies, the statistical predictive power of the relevant machine learning methods will also be improved. This work therefore has the potential to promote further theoretical research on the robustness of machine learning methods for data streams with low signal to noise ratios, such as those found in engineering problems.

Organisers

Researchers and collaborators

Contact info

For more information, please contact The Alan Turing Institute

[email protected]

Funders