Introduction
Machine learning systems are trained on a particular distribution of data and are therefore designed to work well with data that matches closely to this training data. But what happens when a malicious user tries to deceive and compromise a system by manipulating data?
Project aims
Adversarial settings form a challenge in machine learning, and have been studied in the past within various frameworks. Examples of adversarial data include in spam filter attacks, malware code hidden within network packets, and fake biometric traits used to impersonate legitimate users.
With deep learning being the prime machine learning framework in many applications, studying adversarial settings in deep learning becomes a necessity.
This project will look at model uncertainty (the level of uncertainty in a system’s parameters) to identify adversarial examples in standard deep learning architectures used in the field. We will look at whether a model becomes more uncertain when given adversarial inputs.
More formally, recent results derived which cast many existing deep learning architectures as Bayesian approximations, will be utilised to derive epistemic uncertainty estimates.
Applications
The project will concentrate primarily within the field of computer vision and the classification of images, using large models trained by the ImageNet database.