Training Universal Adversarial Perturbations with Alternating Loss Functions

Despite being very successful, deep learning models were shown to be vulnerable to crafted perturbations. Furthermore, changing the prediction of a network over any image by learning a single universal adversarial perturbation (UAP) was shown to be possible. In this work, we propose 3 different ways of training UAPs that can attain a predefined fooling rate, while, in association, optimizing $L_2$ or $L_\infty$ norms. To stabilize around a predefined fooling rate, we have integrated an alternating loss function scheme that changes the current loss function based on a given condition. In particular, the loss functions we propose are: Batch Alternating Loss, Epoch-Batch Alternating Loss and Progressive Alternating Loss. In addition, we empirically observed that UAPs that were learned by minimization attacks contain strong image-like features around the edges, hence we propose integrating a circular masking operation to the training to further alleviate visible perturbations. The proposed $L_2$ Progressive Alternating Loss method outperforms the popular attacks by providing a higher fooling rate at equal $L_2$ norms. Furthermore Filtered Progressive Alternating Loss can further reduce the $L_2$ norm by 33.3% at the same fooling rate. When optimized with regards to $L_\infty$, Progressive Alternating Loss manages to stabilize on the desired fooling rate of 95% with only 1 percentage point of deviation, despite $L_\infty$ norm being particularly sensitive to small updates.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here