‘Stochastic approximation’ methods are useful in optimising mathematical functions that arise in a range of applications such as machine learning. Despite successful implementations, significant technical and theoretical challenges remain which impede the practical implementation of such state-of-the-art algorithms to more diverse sets of real world problems. This project aims to develop and improve the mathematical ‘machinery’ that will help tackle these challenges.
Explaining the science
‘Stochastic approximation’ methods (such as stochastic gradient descent) are algorithms that can be used to optimise noisy functions. They track key parameters of interest in data streams which can have changing dynamics over time. The adaptive nature of stochastic approximation methods make them highly applicable in a range of applications, particularly in machine learning.
They are also closely related to MCMC (Markov chain Monte Carlo) and to other algorithms for sampling from a distribution, such as the stochastic gradient Langevin dynamics. Despite successful implementations, significant technical and theoretical challenges remain which impede the practical implementation of such state-of-the-art algorithms to more diverse sets of real world problems. In many cases (e.g. in engineering, finance) the function required to optimise the processes involved isn’t a continuous one, which makes the application of the desired algorithms much more challenging.
Another difficulty is posed by the fact that real data streams are often not ‘Markovian’ – i.e. the future state of the data stream will not be based solely on its present state, but on the past evolution of a potential multitude of related factors, e.g. in financial markets, weather forecasts, wear out of heavy machinery, traffic networks etc.
Public lecture by Prof Eric Moulines on the promises and pitfalls of stochastic gradient Langevin dynamics (SGLD)
This project aims to tackle the challenges detailed in ‘Explaining the science’ by developing and improving upon the currently available mathematical machinery behind stochastic approximation algorithms. This will allow these algorithms to be adaptable to a range of diverse real world data sets, particularly those found in engineering and machine learning.
The work requires substantial advances of the relevant ‘ergodic theory’ relating to these methods. The central concern of ergodic theory being the behaviour of dynamical systems that run for extended lengths of time, a real world example being industrial machinery lifespans. Tackling discontinuities and non-Markovian phenomena would make a significant step ahead in this area.
Stochastic approximation methods are the dominant method in the training of deep neural networks (DNNs), which can achieve superior performance compared to other methods in many data intensive tasks across multiple disciplines including engineering, finance, and medicine. As an example, DNN-based acoustic models have been shown to work excellently in automatic speech recognition systems which have value in a huge range of industrial activities.