Variational autoencoders (VAEs) have recently been shown to be vulnerable to adversarial attacks, wherein they are fooled into reconstructing a chosen target image. However, how to defend against such attacks remains an open problem. We make significant advances in addressing this issue by introducing methods for producing adversarially robust VAEs. Namely, we first demonstrate that methods used to obtain disentangled latent representations produce VAEs that are more robust to these attacks. However, this robustness comes at the cost of reducing the quality of the reconstructions. We, therefore, introduce a new hierarchical VAE, the Seatbelt-VAE, which can produce high-fidelity autoencoders that are also adversarially robust. We confirm the capabilities of the Seatbelt-VAE on several different datasets and with current state-of-the-art VAE adversarial attacks.

Citation information

Willetts, M., Camuto, A., Rainforth, T., Roberts, S., Holmes, C., (2021) Improving VAEs’ Robustness to Adversarial Attack. International Conference of Learning Representations (ICLR)

Turing affiliated authors