1 code implementation • 19 Nov 2023 • Ping Li, Chenhan Zhang, Zheng Yang, Xianghua Xu, Mingli Song
To this end, we present a Pair-wise Layer Attention with Spatial Masking (PLA-SM) framework for video prediction to capture the spatiotemporal dynamics, which reflect the motion trend.
no code implementations • 25 Sep 2023 • Ping Li, Yu Zhang, Li Yuan, Jian Zhao, Xianghua Xu, Xiaoqin Zhang
Particularly, the gradients from the segmentation model are exploited to discover the easily confused region, in which it is difficult to identify the pixel-wise objects from the background in a frame.
no code implementations • 22 Sep 2023 • Ping Li, Junjie Chen, Li Yuan, Xianghua Xu, Mingli Song
To alleviate the expensive human labeling, semi-supervised semantic segmentation employs a few labeled images and an abundant of unlabeled images to predict the pixel-level label map with the same size.
no code implementations • 21 Sep 2023 • Ping Li, Yu Zhang, Li Yuan, Xianghua Xu
Referring Video Object Segmentation (RVOS) requires segmenting the object in video referred by a natural language query.
no code implementations • 21 Sep 2023 • Ping Li, Yu Zhang, Li Yuan, Huaxin Xiao, Binbin Lin, Xianghua Xu
Unsupervised Video Object Segmentation (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge.
Semantic Segmentation Unsupervised Video Object Segmentation +1
no code implementations • 17 Jun 2023 • Ping Li, Junjie Chen, Binbin Lin, Xianghua Xu
Specifically, we employ an asymmetric encoder to learn the compensating features of the RGB and the thermal images.
Ranked #17 on Thermal Image Segmentation on MFN Dataset
1 code implementation • 17 Jun 2023 • Ping Li, Chenhan Zhang, Xianghua Xu
Video prediction is a pixel-level task that generates future frames by employing the historical frames.
no code implementations • 23 Sep 2020 • Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, Ling Shao
In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention called SUM-GDA, which adapts attention mechanism in a global perspective to consider pairwise temporal relations of video frames.