Embedded Gaussian Affinity is a type of affinity or self-similarity function between two points $\mathbf{x_{i}}$ and $\mathbf{x_{j}}$ that uses a Gaussian function in an embedding space:
$$ f\left(\mathbf{x_{i}}, \mathbf{x_{j}}\right) = e^{\theta\left(\mathbf{x_{i}}\right)^{T}\phi\left(\mathbf{x_{j}}\right)} $$
Here $\theta\left(x_{i}\right) = W_{θ}x_{i}$ and $\phi\left(x_{j}\right) = W_{φ}x_{j}$ are two embeddings.
Note that the self-attention module used in the original Transformer model is a special case of non-local operations in the embedded Gaussian version. This can be seen from the fact that for a given $i$, $\frac{1}{\mathcal{C}\left(\mathbf{x}\right)}\sum_{\forall{j}}f\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)g\left(\mathbf{x}_{j}\right)$ becomes the softmax computation along the dimension $j$. So we have $\mathbf{y} = \text{softmax}\left(\mathbf{x}^{T}W^{T}_{\theta}W_{\phi}\mathbf{x}\right)g\left(\mathbf{x}\right)$, which is the self-attention form in the Transformer model. This shows how we can relate this recent self-attention model to the classic computer vision method of non-local means.
Source: Non-local Neural NetworksPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Object Detection | 3 | 21.43% |
Nutrition | 1 | 7.14% |
Ensemble Learning | 1 | 7.14% |
Medical Object Detection | 1 | 7.14% |
Food recommendation | 1 | 7.14% |
Object Localization | 1 | 7.14% |
Action Classification | 1 | 7.14% |
Action Recognition | 1 | 7.14% |
Instance Segmentation | 1 | 7.14% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |