Understanding the optimisation of machine learning computations from the point of view of numerical analysis has recently attracted a lot of interest. This project is utilising this point of view in order to reveal the strengths and weakness of such optimisation approaches, as well as helping in designing new and more efficient methods.
Explaining the science
The problem of computing expectations with respect to a probability distribution is in the heart of many modern applied mathematics applications. Albeit simple to state, it can be surprisingly difficult to deal with in practice, especially when the dimension of the space is high or when the underlying probability distribution corresponds to a posterior arising from a Bayesian inverse problem in the abundance of data.
Traditional computational statistics methods, such as Markov Chain Monte Carlo have difficulties in dealing with the scenarios described above in part because the emphasis is in providing unbiased statistical estimation. However, if one is willing to allow for some bias in the underlying calculations, then the range of methods that can be used to tackle the original problem increases: a prime example of such methods are those inspired by numerical analysis of stochastic (i.e. random) differential equations.
One of the most common problems in machine learning is the maximisation/minimisation of a loss function. However, this optimisation procedure can become very expensive when trying to calibrate a model for large datasets. In order to reduce the computational overhead one replaces the true gradient of the underlying loss function by a cheaper but stochastic version of it. Similar ideas can be used when one wants to study the full posterior distribution rather than just the maximum a posteriori mode, as is the case for standard optimisation approaches.
Designing computational methods that are optimal in the sense that they provide the 'best' answer for a given computational budget.
Cox processes provide useful and frequently applied models for aggregated spatial point patterns where the aggregation is due to a stochastic environmental heterogeneity. A class of Cox processes most widely used in applications are the Log Gaussian Cox processes, i.e. Cox processes where the logarithm of the intensity surface is a Gaussian process. In the stationary case, the distribution is completely characterised by the intensity and the pair correlation function of the Cox process. Estimating these quantities is very important in order to be able to make predictions from such a model.