Introduction

Statistical Mechanics (SM) provides a probabilistic formulation of the macroscopic behaviour of systems made of many microscopic entities, possibly interacting with each other. This discipline is one of the pillars of Theoretical Physics and arose in the past century to characterise emerging collective phenomena, of which phase transitions constitute a paradigmatic example. Remarkably, typical features of biological neural networks (such as memory, computation, and other emergent skills) can be framed in the rationale of SM once the mathematical modelling of its elemental constituents, (i.e. neurons equipped with their axons, synapses, etc.) is available. In fact, no single neuron can recognise an information pattern: this ability genuinely emerges from the mutual interaction among neurons. It is thus not surprising that, since the pioneering paper by Amit, Gutfreund and Sompolinsky [1] in the post-winter era of AI, SM of Disordered and Complex Systems has played a pivotal role in understanding information processing in Artificial Neural Networks (ANN). Indeed, it is expected to play a crucial role én route toward Explainable Artificial Intelligence (XAI) even in the modern formalisation of the new generation of (possibly “deep”) neural networks and learning machines [2,3]. The present workshop will retain a SM perspective, mixing mathematical and theoretical physics with machine learning.

Explaining the science

In the zoo of task-specific ANNs, Restricted Boltzmann Machine (RBM) are one of the cornerstones of these modern learning architectures. RBM are made up of a two-layer (shallow) neural network, whose visible layer is fed in with empirical data and whose hidden layer infers correlations in the dataset searching patterns to store; furthermore, deep belief networks are nothing but a series of RBMs stacked one into another.

In the biological counterpart, the Hebbian learning rule is the backbone of the Hopfield model, which is then considered as the “harmonic oscillator” for associative memory and pattern recognition. In the SM jargon, the Hopfield model is a “spin-glass” and, in fact, scientists have been able to use analytical tools inspired from disordered and complex SM (e.g., replica trick, cavity fields, stochastic stability) to investigate its features.

In recent times, it became clear that Boltzmann machines and Hopfield models are intimately related [4,5]: roughly speaking, once the RBM is trained, in its future usage it will behave as an Hopfield model for pattern recognition. This allows importing an arsenal of techniques originally stemmed in the SM of disordered systems [6] (hence to address the Hopfield model), also in machine learning (to inspect and quantify the emergent properties of RBMs and their generalisations). This bridge between the two poles of biological versus artificial information processing will guide the present workshop.

 

diagram
Stylised representation of the generalised Hopfield network (left) and its dual generalised (restricted) Boltzmann machine (right), namely the three-partite spin-glass under study: in machine learning jargon these parties are called layers and, here, they are respectively the visible, hidden and spectral layers. [9]

Challenge aims

Understanding the skills that deep artificial networks spontaneously show as their control parameters are made to vary is the main focus of this workshop. At present, it is not known what exactly these skills are, nor how many emergent properties we must expect from these networks. However, there are crystal clear suggestions about “where to sharpen the investigation”: in particular, it is known that, unlike shallow networks, deep networks may have variable signal-to-noise ratios, namely these nets can decide autonomously to sacrifice storage capacity to enhance their threshold for signal detection [7]. This is a very recent result which deserves to be deepened, as applications of this feature, once made well-controllable, can be valuable and have the potential to revolutionise disciplines where early signal detection is a priority.

Another aspect we aim to investigate are the spectral properties that these networks show and their relation to the ability to avoid over-fitting even when hyper-parametrised [8.9]: on random and structureless datasets, it is clear the mechanism by which these networks destroy the spin-glass state obtaining extensive free storage as a reward (the critical storage load shifts from 0.14 bit per neuron and saturate to 1 bit per neuron once these spectral mechanisms are at work) but the control on structured datasets is lacking at present.

Finally, with the current standard package of disordered SM, we can rigorously address networks whose underlying neural and synaptic stochastic dynamics is equipped with detailed balance, hence feed-forward networks (with their chain-rule for learning) are ruled out. These could be possibly included in a SM treatment by relaxing the requirement of detailed balance, but this implies a shift from glassy equilibrium statistical mechanics to completely off-equilibrium statistical mechanics. Although there is a long way to go, in principle, it should be possible to pave such a route toward a complete SM perspective on Theoretical Artificial Intelligence [10,11,12].

Potential for impact

XAI is a central theme of many research teams in machine learning worldwide. The present workshop aims at improving our understanding of AI decision processes by framing its intimate mechanisms in a scientific perspective. This will help the transition from matte-box to clear-box machine learning algorithms.

 

Related activities

The exploration of this novel research area will revolve around a two-weeks workshop funded in the context of the 'Theory and Methods Challenge Fortnights' Turing grant scheme. In addition, a seminar series exploring the ideas of Φ-ML in the engineering context is currently being organised and its provisional agenda is available to view here.

Recent updates

The workshop will take place from Monday 17 January - Friday 28 January 2022.

 

Organisers

Elena Agliari (“Sapienza” University of Rome)
Adriano Barra (University of Salento)
Marya Bazzi (University of Warwick, The Alan Turing Institute)
Andrea Pizzoferrato (University of Bath, The Alan Turing Institute)

 

References

[1] Amit, Daniel J., Hanoch Gutfreund, and Haim Sompolinsky. "Storing infinite numbers of patterns in a spin-glass model of neural networks." Physical Review Letters 55.14 (1985): 1530.

[2] Coolen, Anthony CC, Reimer Kühn, and Peter Sollich. Theory of neural information processing systems. OUP Oxford, 2005.

[3] Agliari, Elena, Adriano Barra, Peter Sollich, and Lenka Zdeborova. "Machine learning and statistical physics: theory, inspiration, application." Journal of Physics A: Special 2020 (2020).

[4] Barra Adriano, Genovese Giuseppe, Sollich Peter, Tantari Daniele, "Phase diagram of restricted Boltzmann machines and generalized Hopfield networks with arbitrary priors" Phys. Rev. E 97, 022310 (2018).

[5] Tubiana, Jérôme, and Rémi Monasson. "Emergence of compositional representations in restricted Boltzmann machines." Physical review letters 118.13 (2017): 138301.

[6] Decelle, Aurelien, et al. "Inference and phase transitions in the detection of modules in sparse networks." Physical Review Letters 107.6 (2011): 065701.

[7] Agliari, E., Alemanno, F., Barra, A., Centonze, M., & Fachechi, A. (2020). Neural networks with a redundant representation: detecting the undetectable. Physical review letters, 124(2), 028301.

[8] Decelle, Aurélien, Giancarlo Fissore, and Cyril Furtlehner. "Spectral dynamics of learning in restricted Boltzmann machines." EPL (Europhysics Letters) 119.6 (2017): 60001.

[9] Fachechi, Alberto, Elena Agliari, and Adriano Barra. "Dreaming neural networks: forgetting spurious memories and reinforcing pure ones." Neural Networks 112 (2019): 24-40.

[10] Del Prete, Valeria, and Anthony CC Coolen. "Non-equilibrium statistical mechanics of recurrent networks with realistic neurons." Neurocomputing 58 (2004): 239-244.

[11] Coolen, Ton, and David Sherrington. "Dynamics of attractor neural networks." Mathematical Approaches to Neural Networks, Elsevier 51 (1993): 293-306.

[12] Mozeika, Alexander, Bo Li, and David Saad. "Space of Functions Computed by Deep-Layered Machines." Physical Review Letters 125.16 (2020): 168301.

Upcoming TMCF events