TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	Cityscapes test	DepthSeg (ResNet-101)	Mean IoU (class)	78.2%	# 60
Semantic Segmentation	NYU Depth v2	RecurrentSceneParsing	Mean IoU	44.5%	# 83
Semantic Segmentation	SUN-RGBD	DPLNet	Mean IoU	45.1%	# 32

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/recurrent-scene-parsing-with-perspective/semantic-segmentation-on-sun-rgbd)](https://paperswithcode.com/sota/semantic-segmentation-on-sun-rgbd?p=recurrent-scene-parsing-with-perspective)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/recurrent-scene-parsing-with-perspective/semantic-segmentation-on-cityscapes)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes?p=recurrent-scene-parsing-with-perspective)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/recurrent-scene-parsing-with-perspective/semantic-segmentation-on-nyu-depth-v2)](https://paperswithcode.com/sota/semantic-segmentation-on-nyu-depth-v2?p=recurrent-scene-parsing-with-perspective)`

Recurrent Scene Parsing with Perspective Understanding in the Loop

CVPR 2018 · Shu Kong, Charless Fowlkes ·

Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation.

PDF Abstract CVPR 2018 PDF CVPR 2018 Abstract

Code

Add Remove Mark official

aimerykong/Recurrent-Scene-Parsing-… official

Tasks

Add Remove

Depth Estimation

Monocular Depth Estimation

Scene Parsing

Segmentation

Semantic Segmentation

Datasets

Cityscapes

NYUv2

SUN RGB-D

Results from the Paper

Edit

Ranked #32 on Semantic Segmentation on SUN-RGBD (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	Cityscapes test	DepthSeg (ResNet-101)	Mean IoU (class)	78.2%	# 60	Compare
Semantic Segmentation	NYU Depth v2	RecurrentSceneParsing	Mean IoU	44.5%	# 83	Compare
Semantic Segmentation	SUN-RGBD	DPLNet	Mean IoU	45.1%	# 32	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

Recurrent Scene Parsing with Perspective Understanding in the Loop

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove