Position

On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1. 3 BLEU and 0. 3 BLEU over absolute position representations, respectively.

Paper
Code

Dual Attention Network for Scene Segmentation

junfu1115/DANet • • CVPR 2019

Specifically, we append two types of attention modules on top of traditional dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively.

Paper
Code

3D human pose estimation in video with temporal convolutions and semi-supervised training

facebookresearch/VideoPose3D • • CVPR 2019

We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints.

Paper
Code

A Transformer-based Approach for Source Code Summarization

wasiahmad/NeuralCodeSum • • ACL 2020

Generating a readable summary that describes the functionality of a program is known as source code summarization.

Paper
Code

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

hustvl/vim • • 17 Jan 2024

The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to be the next-generation backbone for vision foundation models.

Paper
Code

Deep Domain Confusion: Maximizing for Domain Invariance

agrija9/deep-unsupervised-domain-adaptation • • 10 Dec 2014

Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias on a standard benchmark.

Paper
Code

The Case for Learned Index Structures

learnedsystems/RMI • 4 Dec 2017

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not.

Paper
Code

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

ofirpress/attention_with_linear_biases • • ICLR 2022

Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training?

Paper
Code

Position

Benchmarks Add a Result

Libraries

Most implemented papers

Content

Benchmarks

Add a Result