Certified robustness against physically-realizable patch attack via randomized cropping
This paper studies a certifiable defense against adversarial patch attacks on image classification. Our approach classifies random crops from the original image independently and the original image is classified as the vote over these crops. This process minimizes changes to the training process, as only the crop classification model needs to be trained, and can be trained in a standard manner without explicit adversarial training. Leveraging the fact that a patch attack can only influence some pixels of the image, we derive certified robustness bounds on the resulting classification. Our method is particularly effective when realistic physical transformations are applied to the adversarial patch, such as affine transformations. Such transformations occur naturally when an adversarial patch is physically introduced to a scene. Our method improves upon the current state of the art in defending against patch attacks on CIFAR10 and ImageNet, both in terms of certified accuracy and inference time.
PDF Abstract