Training Universal Adversarial Perturbations with Alternating Loss Functions

AAAI Workshop AdvML 2022 · Deniz Sen, Berat Tuna Karli, Alptekin Temizel ·

Despite being very successful, deep learning models were shown to be vulnerable to crafted perturbations. Furthermore, changing the prediction of a network over any image by learning a single universal adversarial perturbation (UAP) was shown to be possible. In this work, we propose 3 different ways of training UAPs that can attain a predefined fooling rate, while, in association, optimizing $L_2$ or $L_\infty$ norms. To stabilize around a predefined fooling rate, we have integrated an alternating loss function scheme that changes the current loss function based on a given condition. In particular, the loss functions we propose are: Batch Alternating Loss, Epoch-Batch Alternating Loss and Progressive Alternating Loss. In addition, we empirically observed that UAPs that were learned by minimization attacks contain strong image-like features around the edges, hence we propose integrating a circular masking operation to the training to further alleviate visible perturbations. The proposed $L_2$ Progressive Alternating Loss method outperforms the popular attacks by providing a higher fooling rate at equal $L_2$ norms. Furthermore Filtered Progressive Alternating Loss can further reduce the $L_2$ norm by 33.3% at the same fooling rate. When optimized with regards to $L_\infty$, Progressive Alternating Loss manages to stabilize on the desired fooling rate of 95% with only 1 percentage point of deviation, despite $L_\infty$ norm being particularly sensitive to small updates.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Training Universal Adversarial Perturbations with Alternating Loss Functions

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove