Pixab-CAM: Attend Pixel, not Channel

29 Sep 2021 · Jaeeun Jang, Seokjun Kim, Hyeoncheol Kim ·

To understand the internal behaviors of convolution neural networks (CNNs), many class activation mapping (CAM) based methods, which generate an explanation map by a linear combination of channels and corresponding weights, have been proposed. Previous CAM-based methods have tried to define a channel-wise weight that represents the importance of a channel for the target class. However, these methods have two common limitations. First, all pixels in the channel share a single scalar value. If the pixels are tied to a specific value, some of them are overestimated. Second, since the explanation map is the result of a linear combination of channels in the activation tensor, it is inevitably dependent on the activation tensor. To address these issues, we propose gradient-free Pixel-wise Ablation-CAM (Pixab-CAM), which utilizes pixel-wise weights rather than channel-wise weights to break the link between pixels in a channel. In addition, in order not to generate an explanation map dependent on the activation tensor, the explanation map is generated only with pixel-wise weights without linear combination with the activation tensor. In this paper, we also propose novel evaluation metrics to measure the quality of explanation maps using an adversarial attack. We demonstrate through experiments the qualitative and quantitative superiority of Pixab-CAM.

PDF Abstract