Search Results for author: Zuozhuo Dai

Found 16 papers, 8 papers with code

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

1 code implementation • 21 Mar 2024 • Shenhao Zhu, Junming Leo Chen, Zuozhuo Dai, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, Siyu Zhu

In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.

Animated GIF Generation Image Animation +1

3,301

Paper
Code

EffiVED:Efficient Video Editing via Text-instruction Diffusion Models

no code implementations • 18 Mar 2024 • Zhenghao Zhang, Zuozhuo Dai, Long Qin, Weizhi Wang

Large-scale text-to-video models have shown remarkable abilities, but their direct application in video editing remains challenging due to limited available datasets.

Video Editing

Paper
Add Code

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

no code implementations • 6 Dec 2023 • Youtian Lin, Zuozhuo Dai, Siyu Zhu, Yao Yao

Moreover, the explicit deformation modeling for discretized Gaussian points ensures ultra-fast training and rendering of a 4D scene, which is comparable to the original 3DGS designed for static 3D reconstruction.

3D Reconstruction 4D reconstruction +1

Paper
Add Code

AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance

1 code implementation • 21 Nov 2023 • Zuozhuo Dai, Zhenghao Zhang, Yao Yao, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang

Image animation is a key task in computer vision which aims to generate dynamic visual content from static image.

Image Animation Image to Video Generation

600

Paper
Code

Fine-grained Text-Video Retrieval with Frozen Image Encoders

no code implementations • 14 Jul 2023 • Zuozhuo Dai, Fangtao Shao, Qingkun Su, Zilong Dong, Siyu Zhu

In the second stage, we propose a novel decoupled video text cross attention module to capture fine-grained multimodal information in spatial and temporal dimensions.

Decoder Retrieval +1

Paper
Add Code

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model

no code implementations • 22 May 2023 • Zhenghao Zhang, Zhichao Wei, Shengfan Zhang, Zuozhuo Dai, Siyu Zhu

Unsupervised video object segmentation has made significant progress in recent years, but the manual annotation of video mask datasets is expensive and limits the diversity of available datasets.

Image Segmentation Object +5

Paper
Add Code

Towards Robust Video Instance Segmentation with Temporal-Aware Transformer

no code implementations • 20 Jan 2023 • Zhenghao Zhang, Fangtao Shao, Zuozhuo Dai, Siyu Zhu

In this paper, we observe the temporal information is important as well and we propose TAFormer to aggregate spatio-temporal features both in transformer encoder and decoder.

Decoder Instance Segmentation +2

Paper
Add Code

RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds

no code implementations • 23 May 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan

In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.

Motion Estimation Point Cloud Registration +1

Paper
Add Code

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

1 code implementation • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan

While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization.

Ranked #1 on Depth Prediction on Matterport3D

Decoder Depth Prediction +1

355

Paper
Code

RCP: Recurrent Closest Point for Point Cloud

1 code implementation • CVPR 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan

In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.

Motion Estimation Point Cloud Registration +1

Paper
Code

Neural Window Fully-Connected CRFs for Monocular Depth Estimation

no code implementations • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan

Estimating the accurate depth from a single image is challenging since it is inherently ambiguous and ill-posed.

Decoder Monocular Depth Estimation

Paper
Add Code

DRO: Deep Recurrent Optimizer for Video to Depth

1 code implementation • 24 Mar 2021 • Xiaodong Gu, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Chengzhou Tang, Zilong Dong, Ping Tan

There are increasing interests of studying the video-to-depth (V2D) problem with machine learning techniques.

Paper
Code

Cluster Contrast for Unsupervised Person Re-Identification

3 code implementations • 22 Mar 2021 • Zuozhuo Dai, Guangyuan Wang, Weihao Yuan, Xiaoli Liu, Siyu Zhu, Ping Tan

Thus, our method can solve the problem of cluster inconsistency and be applicable to larger data sets.

Ranked #1 on Unsupervised Person Re-Identification on PersonX

Clustering Unsupervised Domain Adaptation +2

212

Paper
Code

MeshMVS: Multi-View Stereo Guided Mesh Reconstruction

no code implementations • 17 Oct 2020 • Rakesh Shrestha, Zhiwen Fan, Qingkun Su, Zuozhuo Dai, Siyu Zhu, Ping Tan

Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process.

3D Shape Generation

Paper
Add Code

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

4 code implementations • CVPR 2020 • Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, Ping Tan

The deep multi-view stereo (MVS) and stereo matching approaches generally construct 3D cost volumes to regularize and regress the output depth or disparity.

Ranked #12 on Point Clouds on Tanks and Temples

3D Reconstruction Point Clouds +1

669

Paper
Code

Batch DropBlock Network for Person Re-identification and Beyond

5 code implementations • ICCV 2019 • Zuozhuo Dai, Mingqiang Chen, Xiaodong Gu, Siyu Zhu, Ping Tan

In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch.

Ranked #8 on Person Re-Identification on Market-1501-C

Image Retrieval Metric Learning +1

326

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.