no code implementations • 9 Feb 2024 • Siming Yan, Min Bai, Weifeng Chen, Xiong Zhou, QiXing Huang, Li Erran Li
By combining natural language understanding, generation capabilities, and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented visual reasoning capabilities.
no code implementations • 12 Jan 2024 • Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, Li Erran Li
Affordance grounding refers to the task of finding the area of an object with which one can interact.
1 code implementation • 4 Apr 2023 • Haitao Yang, Zaiwei Zhang, Xiangru Huang, Min Bai, Chen Song, Bo Sun, Li Erran Li, QiXing Huang
Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors.
no code implementations • CVPR 2023 • Zaiwei Zhang, Min Bai, Erran Li
The first task focuses on learning semantic information by sorting local groups of points in the scene into a globally consistent set of semantically meaningful clusters using contrastive learning.
no code implementations • 16 Dec 2022 • Dylan Sam, Min Bai, Tristan McKinney, Li Erran Li
Recent methods in self-supervised learning have demonstrated that masking-based pretext tasks extend beyond NLP, serving as useful pretraining objectives in computer vision.
no code implementations • 18 Jan 2021 • Min Bai, Shenlong Wang, Kelvin Wong, Ersin Yumer, Raquel Urtasun
In this paper, we introduce a non-parametric memory representation for spatio-temporal segmentation that captures the local space and time around an autonomous vehicle (AV).
no code implementations • 17 Jan 2021 • Bin Yang, Min Bai, Ming Liang, Wenyuan Zeng, Raquel Urtasun
The key idea is to decompose the 4D object label into two parts: the object size in 3D that's fixed through time for rigid objects, and the motion path describing the evolution of the object's pose through time.
no code implementations • 8 Aug 2019 • Wei-Chiu Ma, Ignacio Tartavull, Ioan Andrei Bârsan, Shenlong Wang, Min Bai, Gellert Mattyus, Namdar Homayounfar, Shrinidhi Kowshika Lakshmikanth, Andrei Pokrovsky, Raquel Urtasun
In this paper we propose a novel semantic localization algorithm that exploits multiple sensors and has precision on the order of a few centimeters.
no code implementations • 4 May 2019 • Min Bai, Gellert Mattyus, Namdar Homayounfar, Shenlong Wang, Shrinidhi Kowshika Lakshmikanth, Raquel Urtasun
Reliable and accurate lane detection has been a long-standing problem in the field of autonomous driving.
1 code implementation • CVPR 2019 • Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun
More importantly, we introduce a parameter-free panoptic head which solves the panoptic segmentation via pixel-wise classification.
Ranked #3 on Panoptic Segmentation on Indian Driving Dataset
2 code implementations • CVPR 2018 • Diego Marcos, Devis Tuia, Benjamin Kellenberger, Lisa Zhang, Min Bai, Renjie Liao, Raquel Urtasun
The world is covered with millions of buildings, and precisely knowing each instance's position and extents is vital to a multitude of applications.
no code implementations • ICCV 2017 • Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun
In this paper we introduce the TorontoCity benchmark, which covers the full greater Toronto area (GTA) with 712. 5 $km^2$ of land, 8439 $km$ of road and around 400, 000 buildings.
3 code implementations • CVPR 2017 • Min Bai, Raquel Urtasun
Most contemporary approaches to instance segmentation use complex pipelines involving conditional random fields, recurrent neural networks, object proposals, or template matching schemes.
Ranked #1000000000 on Instance Segmentation on Cityscapes test
no code implementations • 6 Apr 2016 • Min Bai, Wenjie Luo, Kaustav Kundu, Raquel Urtasun
We tackle the problem of estimating optical flow from a monocular camera in the context of autonomous driving.