1 code implementation • 8 Apr 2024 • Franz A. Heinsen
We propose a simple modification to the conventional attention mechanism applied by Transformers: Instead of quantifying pairwise query-key similarity with scaled dot-products, we quantify it with the logarithms of scaled dot-products of exponentials.
1 code implementation • 27 Oct 2023 • Franz A. Heinsen
We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} + b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$.
1 code implementation • 20 Nov 2022 • Franz A. Heinsen
We propose a routing algorithm that takes a sequence of vectors and computes a new sequence with specified length and vector size.
2 code implementations • 21 Sep 2022 • Franz A. Heinsen
We propose methods that enable efficient hierarchical classification in parallel.
1 code implementation • 2 Nov 2019 • Franz A. Heinsen
Building on recent work on capsule networks, we propose a new, general-purpose form of "routing by agreement" that activates output capsules in a layer as a function of their net benefit to use and net cost to ignore input capsules from earlier layers.
Ranked #1 on Image Classification on smallNORB