1 code implementation • 22 Apr 2024 • Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai
To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1. 7 million clips with a total duration of 11. 8 thousand hours.
1 code implementation • ICCV 2023 • Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai
To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation.
1 code implementation • ICCV 2023 • Zhexiong Wan, Yuxin Mao, Jing Zhang, Yuchao Dai
Recently, the RGB images and point clouds fusion methods have been proposed to jointly estimate 2D optical flow and 3D scene flow.
no code implementations • 5 Sep 2023 • YuFei Wang, Yuxin Mao, Qi Liu, Yuchao Dai
The decomposed filters not only maintain the favorable properties of guided dynamic filters as being content-dependent and spatially-variant, but also reduce model parameters and hardware costs, as the learned adaptors are decoupled with the number of feature channels.
no code implementations • 16 Aug 2023 • Dawei Hao, Yuxin Mao, Bowen He, Xiaodong Han, Yuchao Dai, Yiran Zhong
In this paper, inspired by the human ability to mentally simulate the sound of an object and its visual appearance, we introduce a bidirectional generation framework.
no code implementations • 31 Jul 2023 • Yuxin Mao, Jing Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai
We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio.
1 code implementation • 6 Jun 2023 • Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai
In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection.
1 code implementation • CVPR 2023 • Bin Fan, Yuxin Mao, Yuchao Dai, Zhexiong Wan, Qi Liu
Rolling shutter correction (RSC) is becoming increasingly popular for RS cameras that are widely used in commercial and industrial applications.
1 code implementation • 16 Nov 2022 • Zhexiong Wan, Yuchao Dai, Yuxin Mao
In this paper, we propose a novel deep learning-based dense and continuous optical flow estimation framework from a single image with event streams, which facilitates the accurate perception of high-speed motion.
no code implementations • 13 Oct 2022 • Yuxin Mao, Zhexiong Wan, Yuchao Dai, Xin Yu
Single image blind deblurring is highly ill-posed as neither the latent sharp image nor the blur kernel is known.
no code implementations • 29 Nov 2021 • Jiadai Sun, Yuxin Mao, Yuchao Dai, Yiran Zhong, Jianyuan Wang
The task of semi-supervised video object segmentation (VOS) has been greatly advanced and state-of-the-art performance has been made by dense matching-based methods.
2 code implementations • 20 Apr 2021 • Yuxin Mao, Jing Zhang, Zhexiong Wan, Yuchao Dai, Aixuan Li, Yunqiu Lv, Xinyu Tian, Deng-Ping Fan, Nick Barnes
For the former, we apply transformer to a deterministic model, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks.
no code implementations • 14 Sep 2020 • Zhexiong Wan, Yuxin Mao, Yuchao Dai
Optical flow estimation is an important computer vision task, which aims at estimating the dense correspondences between two frames.