3 code implementations • 16 May 2024 • Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu
To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs.
1 code implementation • 18 Apr 2024 • Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Xiangtai Li, Chen Change Loy
With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing.
1 code implementation • 17 Apr 2024 • Zhichao Deng, Xiangtai Li, Xia Li, Yunhai Tong, Shen Zhao, Mengyuan Liu
By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance.
1 code implementation • 16 Apr 2024 • Jiangning Zhang, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Zhucun Xue, Yong liu, Guansong Pang, DaCheng Tao
Moreover, current metrics such as AU-ROC have nearly reached saturation on simple datasets, which prevents a comprehensive evaluation of different methods.
1 code implementation • 11 Apr 2024 • Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, Shuicheng Yan
SPR strives to encourage the model to concentrate more on objects rather than context, consisting of two designs: Prior-Free Scanning~(PFS), and Domain Context Interchange~(DCI).
1 code implementation • 9 Apr 2024 • Haoyang He, Yuhu Bai, Jiangning Zhang, Qingdong He, Hongxu Chen, Zhenye Gan, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Lei Xie
Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches.
3 code implementations • 29 Mar 2024 • Yikang Zhou, Tao Zhang, Shunping Ji, Shuicheng Yan, Xiangtai Li
Modern video segmentation methods adopt object queries to perform inter-frame association and demonstrate satisfactory performance in tracking continuously appearing objects despite large-scale motion and transient occlusion.
1 code implementation • 18 Mar 2024 • Xiaojie Li, Yibo Yang, Xiangtai Li, Jianlong Wu, Yue Yu, Bernard Ghanem, Min Zhang
To tackle these challenges, we present GenView, a controllable framework that augments the diversity of positive views leveraging the power of pretrained generative models while preserving semantics.
no code implementations • 14 Mar 2024 • Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan
In-context segmentation has drawn more attention with the introduction of vision foundation models.
2 code implementations • 1 Mar 2024 • Tao Zhang, Xiangtai Li, Haobo Yuan, Shunping Ji, Shuicheng Yan
To enable more effective processing of 3-D point cloud data by Mamba, we propose a novel Consistent Traverse Serialization to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent.
no code implementations • 4 Feb 2024 • Lu Qi, Yi-Wen Chen, Lehan Yang, Tiancheng Shen, Xiangtai Li, Weidong Guo, Yu Xu, Ming-Hsuan Yang
In this work, we propose a novel approach to densely ground visual entities from a long caption.
1 code implementation • 18 Jan 2024 • Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang
Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.
no code implementations • 18 Jan 2024 • Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy
We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.
1 code implementation • 18 Jan 2024 • Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy
In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.
1 code implementation • 16 Jan 2024 • Zhongbin Fang, Xia Li, Xiangtai Li, Shen Zhao, Mengyuan Liu
Through extensive experiments, we demonstrate that our PointMLS achieves state-of-the-art results on ModelNet-O and competitive results on regular datasets, and it is robust and effective.
1 code implementation • 5 Jan 2024 • Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy
The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs).
1 code implementation • 4 Jan 2024 • Xiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, Yining Li, Haian Huang
Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC).
1 code implementation • 4 Jan 2024 • Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma
To this end, we propose Scalable Bias-Mode Attention Mask (BA-SAM) to enhance SAM's adaptability to varying image resolutions while eliminating the need for structure modifications.
1 code implementation • 31 Dec 2023 • Yue Han, Jiangning Zhang, Junwei Zhu, Xiangtai Li, Yanhao Ge, Wei Li, Chengjie Wang, Yong liu, Xiaoming Liu, Ying Tai
This work presents FaceX framework, a novel facial generalist model capable of handling diverse facial tasks simultaneously.
1 code implementation • 12 Dec 2023 • Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, Ming-Hsuan Yang, DaCheng Tao
Following this spirit, this paper explores plain ViT architecture for MUAD.
1 code implementation • 12 Dec 2023 • Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, Wenming Yang
Real-time multi-person pose estimation presents significant challenges in balancing speed and precision.
Ranked #1 on Multi-Person Pose Estimation on CrowdPose (using extra training data)
1 code implementation • 11 Dec 2023 • Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai
It is also the first SAM variant that can run at over 30 FPS on an iPhone 14.
1 code implementation • 6 Dec 2023 • Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Chen Chen, Mengyuan Liu
Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning.
no code implementations • 4 Dec 2023 • Yunhao Liu, Yu-Ju Tsai, Kelvin C. K. Chan, Xiangtai Li, Lu Qi, Ming-Hsuan Yang
Traditional heuristic approaches-either training models directly on these degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective, primarily due to the degradation of facial features and the discrepancy in image domains.
3 code implementations • CVPR 2023 • Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu
PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.
1 code implementation • 6 Nov 2023 • Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang
We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.
1 code implementation • 22 Oct 2023 • Chunlei Wang, Wenquan Feng, Xiangtai Li, Guangliang Cheng, Shuchang Lyu, Binghao Liu, Lijiang Chen, Qi Zhao
While current foundational models excel at various visual language tasks, there's a noticeable absence of models specifically tailored for open-vocabulary visual grounding.
1 code implementation • 2 Oct 2023 • Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yunhai Tong, Chen Change Loy
We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.
1 code implementation • 2 Oct 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy
However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.
1 code implementation • 22 Sep 2023 • Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy
We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.
2 code implementations • 3 Aug 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Jianlong Wu, Lefei Zhang, Zhouchen Lin, Philip Torr, DaCheng Tao, Bernard Ghanem
Beyond the normal case, long-tail class incremental learning and few-shot class incremental learning are also proposed to consider the data imbalance and data scarcity, respectively, which are common in real-world implementations and further exacerbate the well-known problem of catastrophic forgetting.
1 code implementation • 23 Jul 2023 • Menghao Li, Chunlei Wang, Wenquan Feng, Shuchang Lyu, Guangliang Cheng, Xiangtai Li, Binghao Liu, Qi Zhao
The proposed framework is evaluated on five regular VG datasets and two newly constructed robust VG datasets.
1 code implementation • 17 Jul 2023 • Jinghao Wang, Zhengyu Wen, Xiangtai Li, Zujin Guo, Jingkang Yang, Ziwei Liu
Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.
1 code implementation • 28 Jun 2023 • Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, DaCheng Tao
To our knowledge, this is the first comprehensive literature review of open vocabulary learning.
2 code implementations • NeurIPS 2023 • Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M. Buhmann, Chen Change Loy, Mengyuan Liu
With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks.
no code implementations • 9 May 2023 • Guangliang Cheng, Yunmeng Huang, Xiangtai Li, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Shiming Xiang
We first introduce some preliminary knowledge for the change detection task, such as problem definition, datasets, evaluation metrics, and transformer basics, as well as provide a detailed taxonomy of existing algorithms from three different perspectives: algorithm granularity, supervision modes, and learning frameworks in the methodology section.
2 code implementations • 19 Apr 2023 • Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy
Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks.
2 code implementations • ICCV 2023 • Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy
Our framework is a near-online approach that takes a short subclip as input and outputs the corresponding spatial-temporal tube masks.
Ranked #3 on Video Semantic Segmentation on VSPW
1 code implementation • 6 Feb 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao
In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF).
1 code implementation • ICLR 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao
In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF).
Ranked #3 on Few-Shot Class-Incremental Learning on CUB-200-2011
1 code implementation • 3 Jan 2023 • Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao
Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.
1 code implementation • ICCV 2023 • Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, Chengjie Wang
This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance.
1 code implementation • 3 Jan 2023 • Yue Han, Jiangning Zhang, Zhucun Xue, Chao Xu, Xintian Shen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li
In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework.
2 code implementations • ICCV 2023 • Jianzong Wu, Xiangtai Li, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy
Experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS) demonstrate the superiority of the CGG.
1 code implementation • 16 Dec 2022 • Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong
To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps.
1 code implementation • 20 Sep 2022 • Jianzong Wu, Xiangtai Li, Xia Li, Henghui Ding, Yunhai Tong, DaCheng Tao
It considers the negative sentence inputs besides the regular positive text inputs.
1 code implementation • 10 Jul 2022 • Xiangtai Li, Jiangning Zhang, Yibo Yang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, DaCheng Tao
In this paper, we focus on exploring effective methods for faster, accurate, and domain agnostic semantic segmentation.
1 code implementation • 19 Jun 2022 • Jiangning Zhang, Xiangtai Li, Yabiao Wang, Chengjie Wang, Yibo Yang, Yong liu, DaCheng Tao
Motivated by biological evolution, this paper explains the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derives that both have consistent mathematical formulation.
1 code implementation • 28 May 2022 • Yangyang Xu, Xiangtai Li, Haobo Yuan, Yibo Yang, Lefei Zhang
We first model each task with a task-relevant query.
1 code implementation • CVPR 2022 • Xiangtai Li, Wenwei Zhang, Jiangmiao Pang, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy
We hope this simple, yet effective method can serve as a new, flexible baseline in unified video segmentation design.
Ranked #1 on Video Panoptic Segmentation on KITTI-STEP (using extra training data)
1 code implementation • 10 Apr 2022 • Shilin Xu, Xiangtai Li, Jingbo Wang, Guangliang Cheng, Yunhai Tong, DaCheng Tao
This focus on joint human fashion segmentation and attribute recognition.
1 code implementation • 10 Apr 2022 • Xiangtai Li, Shilin Xu, Yibo Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao
To the best of our knowledge, we are the first to solve the PPS problem via \textit{a unified and end-to-end transformer model.
1 code implementation • 17 Mar 2022 • Yibo Yang, Shixiang Chen, Xiangtai Li, Liang Xie, Zhouchen Lin, DaCheng Tao
Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class.
Ranked #26 on Long-tail Learning on CIFAR-10-LT (ρ=100)
3 code implementations • 13 Jan 2022 • Qianyu Zhou, Xiangtai Li, Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, DaCheng Tao
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.
Ranked #4 on Video Object Detection on ImageNet VID (using extra training data)
1 code implementation • 5 Dec 2021 • Haobo Yuan, Xiangtai Li, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, DaCheng Tao
The Depth-aware Video Panoptic Segmentation (DVPS) is a new challenging vision problem that aims to predict panoptic segmentation and depth in a video simultaneously.
1 code implementation • 28 Jul 2021 • Xiangtai Li, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Xiatian Zhu, Tao Xiang
Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation.
1 code implementation • 28 Jul 2021 • Xiangtai Li, Hao He, Yibo Yang, Henghui Ding, Kuiyuan Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao
To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.
no code implementations • 25 May 2021 • Chen Shi, Xiangtai Li, Yanran Wu, Yunhai Tong, Yi Xu
Representation of semantic context and local details is the essential issue for building modern semantic segmentation models.
1 code implementation • 25 May 2021 • Yanran Wu, Xiangtai Li, Chen Shi, Yunhai Tong, Yang Hua, Tao Song, Ruhui Ma, Haibing Guan
Motivated by this, we propose a novel network by aligning two-path information into each other through a learned flow field.
1 code implementation • 25 May 2021 • Hao He, Xiangtai Li, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lubin Weng, Zhouchen Lin, Shiming Xiang
This module is used to squeeze the object boundary from both inner and outer directions, which contributes to precise mask representation.
1 code implementation • 23 May 2021 • Lu He, Qianyu Zhou, Xiangtai Li, Li Niu, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang
Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.
1 code implementation • ICCV 2021 • Hao He, Xiangtai Li, Guangliang Cheng, Jianping Shi, Yunhai Tong, Gaofeng Meng, Véronique Prinet, Lubin Weng
We use these two modules to design a decoder that generates accurate and clean segmentation results, especially on the object contours.
Ranked #20 on Thermal Image Segmentation on RGB-T-Glass-Segmentation
1 code implementation • CVPR 2021 • Xiangtai Li, Hao He, Xia Li, Duo Li, Guangliang Cheng, Jianping Shi, Lubin Weng, Yunhai Tong, Zhouchen Lin
Experimental results on three different aerial segmentation datasets suggest that the proposed method is more effective and efficient than state-of-the-art general semantic segmentation methods.
13 code implementations • CVPR 2021 • Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen
Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision.
Ranked #707 on Image Classification on ImageNet
1 code implementation • 6 Nov 2020 • Xiangtai Li, Xia Li, Ansheng You, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Zhouchen Lin
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced.
2 code implementations • ECCV 2020 • Xiangtai Li, Xia Li, Li Zhang, Guangliang Cheng, Jianping Shi, Zhouchen Lin, Shaohua Tan, Yunhai Tong
Our insight is that appealing performance of semantic segmentation requires \textit{explicitly} modeling the object \textit{body} and \textit{edge}, which correspond to the high and low frequency of the image.
6 code implementations • ECCV 2020 • Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong
A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation.
Ranked #2 on Real-Time Semantic Segmentation on Cityscapes test
2 code implementations • 16 Sep 2019 • Xiangtai Li, Li Zhang, Ansheng You, Maoke Yang, Kuiyuan Yang, Yunhai Tong
GALD is end-to-end trainable and can be easily plugged into existing FCNs with various global aggregation modules for a wide range of vision tasks, and consistently improves the performance of state-of-the-art object detection and instance segmentation approaches.
Ranked #1 on Semantic Segmentation on PASCAL VOC 2007
6 code implementations • 13 Sep 2019 • Li Zhang, Xiangtai Li, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip H. S. Torr
Exploiting long-range contextual information is key for pixel-wise prediction tasks such as semantic segmentation.
Ranked #32 on Semantic Segmentation on Cityscapes test
2 code implementations • 3 Apr 2019 • Xiangtai Li, Houlong Zhao, Lei Han, Yunhai Tong, Kuiyuan Yang
Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel.
Ranked #29 on Semantic Segmentation on Cityscapes test