Introduction

Recently, diffusion processes played a critical role in multiple endeavours in machine learning ranging from novel generative models to nonconvex optimisation. In the context of generative modelling, diffusion-based generative models have been introduced as an alternative to more traditional methods like variational autoencoders (VAE) and generative adversarial networks (GANs) in [1, 2]. Diffusion models quickly became popular, displayed impressive results, and outperformed their competitors on a variety of tasks, see, e.g., [3]. In nonconvex optimisation, these processes are deployed and analysed as nonconvex optimisation algorithms, providing theoretically sound optimisation methods for machine learning, see, e.g., [4, 5]. All of these methods share certain similarities, that is, they are based on simple principles of diffusion processes and conceptually simple to understand and deploy. The success of these models are demonstrated on a large number of tasks, ranging from generation of multimedia data (such as faces, videos, audio) to molecules, and they are used in training of neural networks, editing and controllable generation of data. As a rapidly developing research area, these models deserve a close attention to understand how and why they work in order to better understand their limits, applicability, and risks.

Explaining the science

The diffusion models are simply simulated stochastic processes, their output can be an estimate of the global minima or designed so that they match the properties of a given dataset. More precisely, in the context of sampling, some of these models aim at approximating the data distribution gradient (e.g. [1]), which then can be used to generate new samples using a simple Langevin diffusion. Some other methods “noise” the data until a Gaussian distribution is obtained and then reverse this process to generate a sample with Gaussian initialization [2]. In either way, these models can simulate new samples straightforwardly using a simple algorithmic procedure. The core ideas of the methods rely on diffusion processes, a well-studied topic in probability theory and stochastic analysis literature.

Conceptually similar, nonconvex optimisers are diffusion processes that aim at optimising a certain cost function to find the relevant minima. Designed correctly, these processes are global optimisers with provable guarantees and they can replace current optimisers which are based on ad-hoc results and heuristics.

Challenge aims

These models exhibit a variety of interesting phenomena which can be studied from a stochastic perspective. The diffusion models are shown to be widely successful, but their theoretical properties are not well understood. A few works in this area rely on restrictive mathematical assumptions on the nature of the models used, which are unlikely to hold in real applications. Our challenge has two main aims to contribute to the developments on this area:

1)     We aim at classifying and identifying a generic version of these models to be analysed, which can shed light onto many of the empirical phenomena that is being observed in this literature. Our team consists of experts in probability theory and stochastic analysis who are also working in this area, which allows us to discuss and observe theoretical problems in this area, hence start tackling them. Some examples include the analysis of energy based models, which are a combination of diffusion based models (e.g. MCMC) and optimisation, resulting in a path dependent diffusion which is notoriously difficult to analyse.

2)     We aim at organising a coding week associated with our challenges to better understand numerical behaviour of the diffusion models under particular scenarios that are of theoretical interest. Also, the new ideas tackling various issues about efficient training of these models will also be tested, hence will help creation of new diffusion models.

Potential for impact

Generative models and nonconvex optimisers are fundamental pillars of modern machine learning. Any advances resulting from our event will contribute to these areas and will contribute to advancing these two fields.

Given the fast paced industry adoption of these models for generation of text, images, or audio data as well as the use of them in editing and controllable generation, the potential new algorithms and understanding of these models resulting from our event hold the promise of a large impact on industrial applications as well as follow up scientific developments.

Related activities

The first week of this event is scheduled to take place between 6-10 June 2022. The associated second practical week will be held between 5-9 September 2022, together with a workshop on the last day about the results.

References

[1] Song, Y., & Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32.

[2] Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020, September). Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations.

[3] Dhariwal, P., & Nichol, A. (2021). Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems, 34.

[4] Zhang, Y., Akyildiz, Ö. D., Damoulas, T., & Sabanis, S. (2019). Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization. arXiv preprint arXiv:1910.02008.

[5] Lim, D. Y., & Sabanis, S. (2021). Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks. arXiv preprint arXiv:2105.13937.

Organisers

Dr Ömer Deniz Akyildiz (The Alan Turing Institute)

Professor Sotirios Sabanis (University of Edinburgh)

Professor Ioannis Kosmidis (TMCF Lead)

Charlie Thomas (The Alan Turing Institute)

Patty Holley (The Alan Turing Institute)

Upcoming TMCF events