Resilience to Multiple Attacks via Adversarially Trained MIMO Ensembles

29 Sep 2021 · Ruqi Bai, David I. Inouye, Saurabh Bagchi ·

While ensemble methods have been widely used for robustness against random perturbations (\ie the average case), ensemble approaches for robustness against adversarial perturbations (\ie the worst case) have remained elusive despite multiple prior attempts. We show that ensemble methods can improve adversarial robustness to multiple attacks if the ensemble is \emph{adversarially diverse}, which is defined by two properties: 1) the sub-models are adversarially robust themselves and yet 2) adversarial attacks do not transfer easily between sub-models. While at first glance, creating such an ensemble would seem computationally expensive, we demonstrate that an adversarially diverse ensemble can be trained with minimal computational overhead via a Multiple-Input Multiple-Output (MIMO) model. Specifically, we propose to train a MIMO model with adversarial training ({\emph{MAT}}), where each sub-model can be trained on a different attack type. When computing gradients for generating adversarial examples during training, we use the gradient with respect to the ensemble objective. This has a two-fold benefit: 1) it only requires 1 backward pass and 2) the cross-gradient information between the models promotes robustness against transferable attacks. We empirically demonstrate that {\emph{MAT}} produces an ensemble of models that is adversarially diverse and significantly improves performance over single models or vanilla ensembles while being comparable to previous state-of-the-art methods. On MNIST, we obtain $99.5\%$ clean accuracy and ($88.6\%, 57.1\%,71.6\%$) against $(\ell_\infty, \ell_2, \ell_1)$ attacks, and on CIFAR10, we achieve $79.7\%$ clean accuracy and ($47.9\%, 61.8\%,47.6\%$) against $(\ell_\infty, \ell_2, \ell_1)$ attacks, which are comparable to previous state-of-the-art methods.

PDF Abstract