TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Monocular Depth Estimation	KITTI Eigen split	PixelFormer	absolute relative error	0.051	# 19
Monocular Depth Estimation	KITTI Eigen split	PixelFormer	RMSE	2.081	# 20
Monocular Depth Estimation	KITTI Eigen split	PixelFormer	Sq Rel	0.149	# 9
Monocular Depth Estimation	KITTI Eigen split	PixelFormer	RMSE log	0.077	# 18
Monocular Depth Estimation	KITTI Eigen split	PixelFormer	Delta < 1.25	0.976	# 18
Monocular Depth Estimation	KITTI Eigen split	PixelFormer	Delta < 1.25^2	0.997	# 16
Monocular Depth Estimation	KITTI Eigen split	PixelFormer	Delta < 1.25^3	0.999	# 11
Monocular Depth Estimation	NYU-Depth V2	PixelFormer	RMSE	0.322	# 27
Monocular Depth Estimation	NYU-Depth V2	PixelFormer	absolute relative error	0.090	# 27
Monocular Depth Estimation	NYU-Depth V2	PixelFormer	Delta < 1.25	0.929	# 28
Monocular Depth Estimation	NYU-Depth V2	PixelFormer	Delta < 1.25^2	0.991	# 21
Monocular Depth Estimation	NYU-Depth V2	PixelFormer	Delta < 1.25^3	0.998	# 18
Monocular Depth Estimation	NYU-Depth V2	PixelFormer	log 10	0.039	# 27

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/attention-attention-everywhere-monocular/monocular-depth-estimation-on-kitti-eigen)](https://paperswithcode.com/sota/monocular-depth-estimation-on-kitti-eigen?p=attention-attention-everywhere-monocular)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/attention-attention-everywhere-monocular/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=attention-attention-everywhere-monocular)`

Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention

17 Oct 2022 · Ashutosh Agarwal, Chetan Arora ·

Monocular Depth Estimation (MDE) aims to predict pixel-wise depth given a single RGB image. For both, the convolutional as well as the recent attention-based models, encoder-decoder-based architectures have been found to be useful due to the simultaneous requirement of global context and pixel-level resolution. Typically, a skip connection module is used to fuse the encoder and decoder features, which comprises of feature map concatenation followed by a convolution operation. Inspired by the demonstrated benefits of attention in a multitude of computer vision problems, we propose an attention-based fusion of encoder and decoder features. We pose MDE as a pixel query refinement problem, where coarsest-level encoder features are used to initialize pixel-level queries, which are then refined to higher resolutions by the proposed Skip Attention Module (SAM). We formulate the prediction problem as ordinal regression over the bin centers that discretize the continuous depth range and introduce a Bin Center Predictor (BCP) module that predicts bins at the coarsest level using pixel queries. Apart from the benefit of image adaptive depth binning, the proposed design helps learn improved depth embedding in initial pixel queries via direct supervision from the ground truth. Extensive experiments on the two canonical datasets, NYUV2 and KITTI, show that our architecture outperforms the state-of-the-art by 5.3% and 3.9%, respectively, along with an improved generalization performance by 9.4% on the SUNRGBD dataset. Code is available at https://github.com/ashutosh1807/PixelFormer.git.

PDF Abstract

Code

Add Remove Mark official

ashutosh1807/pixelformer official

Tasks

Add Remove

Decoder

Depth Estimation

Depth Prediction

Monocular Depth Estimation

Datasets

KITTI

NYUv2

SUN RGB-D

Results from the Paper

Edit

Ranked #19 on Monocular Depth Estimation on KITTI Eigen split (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Monocular Depth Estimation	KITTI Eigen split	PixelFormer	absolute relative error	0.051	# 19	Compare
			RMSE	2.081	# 20	Compare
			Sq Rel	0.149	# 9	Compare
			RMSE log	0.077	# 18	Compare
			Delta < 1.25	0.976	# 18	Compare
			Delta < 1.25^2	0.997	# 16	Compare
			Delta < 1.25^3	0.999	# 11	Compare
Monocular Depth Estimation	NYU-Depth V2	PixelFormer	RMSE	0.322	# 27	Compare
			absolute relative error	0.090	# 27	Compare
			Delta < 1.25	0.929	# 28	Compare
			Delta < 1.25^2	0.991	# 21	Compare
			Delta < 1.25^3	0.998	# 18	Compare
			log 10	0.039	# 27	Compare

Methods

Add Remove

Convolution

Edit Social Preview

Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove