Developing the mathematical foundations of learning for non-Euclidean objects

Date and Venue TBC at the British Library

Main organisers: Terry Lyons, Patrick Wolfe

The quantities and data types that we observe do not always present themselves directly in a non-degenerate Euclidean form.  Effective inference based on such observations can strongly benefit from, or even require, innovative mathematical thinking.

For example, a strong sequential structure permeates vast amounts of important data, introducing a natural non-commutative structure that can be useful for pattern recognition and inference.  Social or biological networks are modelled by networks, raising questions about learning functions on graphs; effective analysis of proteins seems to be coming out of Cryo-electron microscopy (EM), which represents the protein as a function on a rotation group. (See, e.g., recent work by Leslie Greengard).  Computational topology gives a means to structure high-dimensional data, manifold-valued data (shapes and images) and data that lie on stratified manifolds (such as trees, often called strongly non-Euclidean data, as no tangent space exists).

General algebraic structures are emerging as an important methodology in many fields. The question of representation of data is deep, and concrete examples demonstrate that simple embedding of this data into a classical Euclidean context can be deeply unsatisfactory.  Taking sequential data as an example: major progress with the modelling of sequential data came with the realisation that time series approximations will fail to capture an oscillatory stream, such as a fractional Brownian motion, well enough to predict its effect however finely one samples the stream. Moreover the order structure allows notions of causation not apparent in abstract data.

The recent years have seen much progress on the theoretical and applied side. The aim of this workshop is to connect theoretical advances from “pure” mathematics with real world applications, as well as to connect research questions with demand by the industry. Specifically, this workshop seeks to scope key data science questions in this area as follows:

  1. What are the fundamental ideas are required to unify the analysis of shapes, graphs, paths, manifolds, social data, and other data (e.g., protein data provided by Cryo-EM)?
  2. How does one use the existing learning literature in non-numerical examples, such as the ‘bar codes’ of homology? How to exploit mathematical structure in underlying data structures so as to be more effective in supervised and unsupervised learning?
  3. How to create and implement data structures and algorithms that allow reduced dimension and greater scalability for non-Euclidean learning, and also for the analysis of non-Euclidean data (i.e., the difference between manifold learning and inference for data which individually lies in a manifold space)?
  4. Which are the best (hierarchical) approaches to approximating functions on non-Euclidean objects? (As seen by rough path theory, one realises that there are significant alternatives.)  What is the best approach to feature selection in this general setting?