PointQuad-Transformer

Introduced by Chen et al. in PQ-Transformer: Jointly Parsing 3D Objects and Layouts from Point Clouds

PQ-Transformer, or PointQuad-Transformer, is a Transformer-based architecture that predicts 3D objects and layouts simultaneously, using point cloud inputs. Unlike existing methods that either estimate layout keypoints or edges, room layouts are directly parameterized as a set of quads. Along with the quad representation, a physical constraint loss function is used that discourages object-layout interference.

Given an input 3D point cloud of $N$ points, the point cloud feature learning backbone extracts $M$ context-aware point features of $\left(3+C\right)$ dimensions, through sampling and grouping. A voting module and a farthest point sampling (FPS) module are used to generate $K_{1}$ object proposals and $K_{2}$ quad proposals respectively. Then the proposals are processed by a transformer decoder to further refine proposal features. Through several feedforward layers and non-maximum suppression (NMS), the proposals become the final object bounding boxes and layout quads.

Source: PQ-Transformer: Jointly Parsing 3D Objects and Layouts from Point Clouds

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Object Detection	1	50.00%
Scene Understanding	1	50.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Dense Connections	Feedforward Networks
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
Non Maximum Suppression	Proposal Filtering
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms

Categories

Add Remove

Point Cloud Models