Mathematical Underpinnings in Digital Twins

Project status

Ongoing

Introduction

One of the inherent parts of a Digital Twin is a computational model that has been validated to be suitable for use in an engineering context. The potential cost reduction that a digital prototype can offer makes the development of such models highly compelling. However, when the system with a validated model undergoes modifications, it raises the question of whether the error bounds in the predictions can be established with enough confidence to forego prototyping and testing of the modified system. This issue could be potentially addressed via the kind of generalisation inequalities found in Statistical Learning Theory (SLT).

Explaining the science

Statistical Learning Theory (SLT) is a body of knowledge focused on characterising the performance of learning algorithms within a rigorous mathematical framework. Ultimately, the goal of SLT is to contribute to the design of better algorithms that can generalise well from a given set of data.

Considering that the computational model is an inherent part of a Digital Twin, it makes sense to delve deeper into the limitations that such a model may impose on the representativeness of the Digital Twin. Specifically, the premise of this project is to address problems such as the following:

“Consider a structure S with a model (experimentally) validated to a given precision, with prediction errors bounded above by e. Suppose one wishes to modify the structure to a form S’; can one establish error bounds e’ on a model of S’, with enough confidence to forego prototyping and testing of S’.”

The generalisation bounds found in SLT can provide a way to quantify (or bound) the prediction errors of models when they undergo modifications. These bounds offer estimates of errors on a testing set of data concerning a Machine Learning (ML) model developed on a finite sample of training data. Under certain assumptions, these generalisation bounds typically take the form,
R ≤R_emp+ Φ_ML (h,n)

Where R denotes the guaranteed classification error over test data, Remp is the empirical classification error over training data, and ΦML is the confidence interval defined by the complexity h of the learner and size n of the training data set. 

However, these inequalities are derived under the assumption that the training and test sets are drawn from the same underlying probability distribution. The problem at hand is, unfortunately, more complex, as one cannot expect data from structures S and S’ to be drawn from the same distribution. A possible solution to address this challenge is to frame the problem from a Transfer Learning (TL) perspective, where training and test data refer to different domains; namely, a source domain and a target domain. Some initial work based on the idea of extending the generalisation bounds has been pursued by Ben-David et al., so that the relevant bounds take the form,
R_t  ≤R_s+ Φ_TL (h,n)

Where Rt now denotes the classification error over the target test data, Rs is the classification error on the source training data, and the confidence interval is now defined by the complexity of the TL algorithm.

Several challenges emerge in attempting to apply the theory in real engineering applications. While the form of these inequalities may seem simple, defining the second term for a given application is usually an incredibly difficult problem, and likely to be completely intractable for highly complex structures. Nevertheless, gaining a better understanding of the confidence interval in this context could prove to be beneficial in the development of better computational models.
 

Project aims

One aim of this project is to extend the generalisation bounds to address real engineering problems. Given the difficulty in defining the complexity of the learner, and the corresponding confidence interval, the necessary research will involve experimental work on a hierarchy of structures, ranging from the very simple to the complex.

The project encompasses fundamental research in the theory and application of ML/TL, with specific focus on generalisation bounds derived from SLT. This theoretical foundation will be experimentally validated on real-life structures across a range of scales.
 

Applications

The concepts stemming from SLT are explored in the context of Structural Health Monitoring (focussed on infrastructure) to optimise the design/prototype cycle. Concretely, the goal is to reduce the need for full-scale prototype testing, considering its high cost and time implications in the context of SHM.

Previous contributors