1 code implementation • 21 Mar 2024 • Shenhao Zhu, Junming Leo Chen, Zuozhuo Dai, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, Siyu Zhu
In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.
no code implementations • 18 Mar 2024 • Zhenghao Zhang, Zuozhuo Dai, Long Qin, Weizhi Wang
Large-scale text-to-video models have shown remarkable abilities, but their direct application in video editing remains challenging due to limited available datasets.
no code implementations • 6 Dec 2023 • Youtian Lin, Zuozhuo Dai, Siyu Zhu, Yao Yao
Moreover, the explicit deformation modeling for discretized Gaussian points ensures ultra-fast training and rendering of a 4D scene, which is comparable to the original 3DGS designed for static 3D reconstruction.
1 code implementation • 21 Nov 2023 • Zuozhuo Dai, Zhenghao Zhang, Yao Yao, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang
Image animation is a key task in computer vision which aims to generate dynamic visual content from static image.
no code implementations • 14 Jul 2023 • Zuozhuo Dai, Fangtao Shao, Qingkun Su, Zilong Dong, Siyu Zhu
In the second stage, we propose a novel decoupled video text cross attention module to capture fine-grained multimodal information in spatial and temporal dimensions.
no code implementations • 22 May 2023 • Zhenghao Zhang, Zhichao Wei, Shengfan Zhang, Zuozhuo Dai, Siyu Zhu
Unsupervised video object segmentation has made significant progress in recent years, but the manual annotation of video mask datasets is expensive and limits the diversity of available datasets.
no code implementations • 20 Jan 2023 • Zhenghao Zhang, Fangtao Shao, Zuozhuo Dai, Siyu Zhu
In this paper, we observe the temporal information is important as well and we propose TAFormer to aggregate spatio-temporal features both in transformer encoder and decoder.
no code implementations • 23 May 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan
In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.
1 code implementation • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan
While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization.
Ranked #1 on Depth Prediction on Matterport3D
1 code implementation • CVPR 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan
In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.
no code implementations • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan
Estimating the accurate depth from a single image is challenging since it is inherently ambiguous and ill-posed.
1 code implementation • 24 Mar 2021 • Xiaodong Gu, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Chengzhou Tang, Zilong Dong, Ping Tan
There are increasing interests of studying the video-to-depth (V2D) problem with machine learning techniques.
3 code implementations • 22 Mar 2021 • Zuozhuo Dai, Guangyuan Wang, Weihao Yuan, Xiaoli Liu, Siyu Zhu, Ping Tan
Thus, our method can solve the problem of cluster inconsistency and be applicable to larger data sets.
Ranked #1 on Unsupervised Person Re-Identification on PersonX
no code implementations • 17 Oct 2020 • Rakesh Shrestha, Zhiwen Fan, Qingkun Su, Zuozhuo Dai, Siyu Zhu, Ping Tan
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process.
4 code implementations • CVPR 2020 • Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, Ping Tan
The deep multi-view stereo (MVS) and stereo matching approaches generally construct 3D cost volumes to regularize and regress the output depth or disparity.
Ranked #12 on Point Clouds on Tanks and Temples
5 code implementations • ICCV 2019 • Zuozhuo Dai, Mingqiang Chen, Xiaodong Gu, Siyu Zhu, Ping Tan
In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch.
Ranked #8 on Person Re-Identification on Market-1501-C