1 code implementation • 1 Apr 2024 • Cheng Lu, Jiusun Zeng, Yu Xia, Jinhui Cai, Shihua Luo
As a favorable tool for explainable artificial intelligence (XAI), Shapley value has been widely used to interpret deep learning based predictive models.
Explainable artificial intelligence Explainable Artificial Intelligence (XAI)
no code implementations • 3 Mar 2024 • Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong, Hailun Lian
In this paper, we propose Prosody-aware VITS (PAVITS) for emotional voice conversion (EVC), aiming to achieve two major objectives of EVC: high content naturalness and high emotional naturalness, which are crucial for meeting the demands of human perception.
2 code implementations • 1 Feb 2024 • Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang
We prove that EXO is guaranteed to optimize in the same direction as the RL algorithms asymptotically for arbitary parametrization of the policy, while enables efficient optimization by circumventing the complexities associated with RL algorithms.
no code implementations • 30 Jan 2024 • Chen Bai, Zeman Shao, Guoxiang Zhang, Di Liang, Jie Yang, Zhuorui Zhang, Yujian Guo, Chengzhang Zhong, Yiqiao Qiu, Zhendong Wang, Yichen Guan, Xiaoyin Zheng, Tao Wang, Cheng Lu
Our proposed general framework encompasses three key processes: 1) integrating a realistic object into a given scene video with proper placement to ensure geometric realism; 2) estimating the sky and environmental lighting distribution and simulating realistic shadows to enhance the light realism; 3) employing a style transfer network that refines the final video output to maximize photorealism.
no code implementations • 19 Jan 2024 • Yong Wang, Cheng Lu, Hailun Lian, Yan Zhao, Björn Schuller, Yuan Zong, Wenming Zheng
These segment-level patches are then encoded using a stack of Swin blocks, in which a local window Transformer is utilized to explore local inter-frame emotional information across frame patches of each segment patch.
no code implementations • 18 Jan 2024 • Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Björn Schuller, Wenming Zheng
In speaker-independent speech emotion recognition, the training and testing samples are collected from diverse speakers, leading to a multi-domain shift challenge across the feature distributions of data from different speakers.
no code implementations • 2 Nov 2023 • Shen Nie, Hanzhong Allan Guo, Cheng Lu, Yuhao Zhou, Chenyu Zheng, Chongxuan Li
We present a unified probabilistic formulation for diffusion-based image editing, where a latent variable is edited in a task-specific manner and generally deviates from the corresponding marginal distribution induced by the original stochastic or ordinary differential equation (SDE or ODE).
1 code implementation • NeurIPS 2023 • Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu
In this work, we propose a novel formulation towards the optimal parameterization during sampling that minimizes the first-order discretization error of the ODE solution.
1 code implementation • 11 Oct 2023 • Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu
Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies.
no code implementations • 7 Oct 2023 • Jie Zhu, Yuan Zong, Jingang Shi, Cheng Lu, Hongli Chang, Wenming Zheng
This paper focuses on the research of micro-expression recognition (MER) and proposes a flexible and reliable deep learning method called learning to rank onset-occurring-offset representations (LTR3O).
1 code implementation • 28 May 2023 • Zihan Chen, Lei Nico Zheng, Cheng Lu, Jialu Yuan, Di Zhu
However, its potential for inferring dynamic network structures from temporal textual data, specifically financial news, remains an unexplored frontier.
2 code implementations • NeurIPS 2023 • Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
In comparison, VSD works well with various CFG weights as ancestral sampling from diffusion models and simultaneously improves the diversity and sample quality with a common CFG weight (i. e., $7. 5$).
1 code implementation • 6 May 2023 • Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu
The probability flow ordinary differential equation (ODE) of diffusion models (i. e., diffusion ODEs) is a particular case of continuous normalizing flows (CNFs), which enables deterministic inference and exact likelihood evaluation.
Ranked #1 on Image Generation on ImageNet 32x32 (bpd metric)
3 code implementations • 25 Apr 2023 • Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu
The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate.
no code implementations • 25 Nov 2022 • Cheng Lyu, Jiake Xie, Bo Xu, Cheng Lu, Han Huang, Xin Huang, Ming Wu, Chuang Zhang, Yong Tang
Performance of trimap-free image matting methods is limited when trying to decouple the deterministic and undetermined regions, especially in the scenes where foregrounds are semantically ambiguous, chromaless, or high transmittance.
1 code implementation • 2 Nov 2022 • Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
The commonly-used fast sampler for guided sampling is DDIM, a first-order diffusion ODE solver that generally needs 100 to 250 steps for high-quality samples.
no code implementations • 22 Oct 2022 • Cheng Lu, Wenming Zheng, Hailun Lian, Yuan Zong, Chuangao Tang, Sunan Li, Yan Zhao
The F-Encoder and T-Encoder model the correlations within frequency bands and time frames, respectively, and they are embedded into a time-frequency joint learning strategy to obtain the time-frequency patterns for speech emotions.
1 code implementation • 29 Sep 2022 • Huayu Chen, Cheng Lu, Chengyang Ying, Hang Su, Jun Zhu
To address this problem, we adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model.
1 code implementation • 26 Aug 2022 • Tao Sun, Cheng Lu, Haibin Ling
We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features, as used in previous works.
1 code implementation • ICCV 2023 • Tao Sun, Cheng Lu, Haibin Ling
In this paper, we propose a Local context-aware ADA framework, named LADA, to address this issue.
1 code implementation • 18 Jul 2022 • Tao Sun, Cheng Lu, Haibin Ling
We propose a general rectification module that uses such prior knowledge to refine model generated pseudo labels.
no code implementations • 15 Jul 2022 • Jianwei Lin, Jiatai Lin, Cheng Lu, Hao Chen, Huan Lin, Bingchao Zhao, Zhenwei Shi, Bingjiang Qiu, Xipeng Pan, Zeyan Xu, Biao Huang, Changhong Liang, Guoqiang Han, Zaiyi Liu, Chu Han
To bridge the gap between Transformer and CNN features, we propose a Trans&CNN Feature Calibration block (TCFC) in the decoder.
1 code implementation • 6 Jul 2022 • Runyu Mao, Chen Bai, Yatong An, Fengqing Zhu, Cheng Lu
To the best of our knowledge, 3DG-STFM is the first student-teacher learning method for the local feature matching task.
1 code implementation • 16 Jun 2022 • Cheng Lu, Kaiwen Zheng, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
To fill up this gap, we show that the negative likelihood of the ODE can be bounded by controlling the first, second, and third-order score matching errors; and we further present a novel high-order denoising score matching method to enable maximum likelihood training of score-based diffusion ODEs.
2 code implementations • 2 Jun 2022 • Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
In this work, we propose an exact formulation of the solution of diffusion ODEs.
no code implementations • 20 Apr 2022 • Bo Xu, Jiake Xie, Han Huang, Ziwen Li, Cheng Lu, Yong Tang, Yandong Guo
In this paper, we propose a Situational Perception Guided Image Matting (SPG-IM) method that mitigates subjective bias of matting annotations and captures sufficient situational perception information for better global saliency distilled from the visual-to-textual task.
1 code implementation • CVPR 2022 • Tao Sun, Cheng Lu, Tianshuo Zhang, Haibin Ling
Unsupervised Domain Adaptation (UDA) aims to leverage a label-rich source domain to solve tasks on a related unlabeled target domain.
no code implementations • 13 Apr 2022 • Chu Han, Xipeng Pan, Lixu Yan, Huan Lin, Bingbing Li, Su Yao, Shanshan Lv, Zhenwei Shi, Jinhai Mai, Jiatai Lin, Bingchao Zhao, Zeyan Xu, Zhizhen Wang, Yumeng Wang, Yuan Zhang, Huihui Wang, Chao Zhu, Chunhui Lin, Lijian Mao, Min Wu, Luwen Duan, Jingsong Zhu, Dong Hu, Zijie Fang, Yang Chen, Yongbing Zhang, Yi Li, Yiwen Zou, Yiduo Yu, Xiaomeng Li, Haiming Li, Yanfen Cui, Guoqiang Han, Yan Xu, Jun Xu, Huihua Yang, Chunming Li, Zhenbing Liu, Cheng Lu, Xin Chen, Changhong Liang, Qingling Zhang, Zaiyi Liu
According to the technical reports of the top-tier teams, CAM is still the most popular approach in WSSS.
Data Augmentation Weakly supervised Semantic Segmentation +1
no code implementations • 8 Mar 2022 • Bo Xu, Guanze Liu, Han Huang, Cheng Lu, Yandong Guo
Most existing CNN-based salient object detection methods can identify local segmentation details like hair and animal fur, but often misinterpret the real saliency due to the lack of global contextual information caused by the subjectiveness of the SOD task and the locality of convolution layers.
no code implementations • 28 Jan 2022 • Changwei Xu, Jianfei Yang, Haoran Tang, Han Zou, Cheng Lu, Tianshuo Zhang
Unsupervised Domain Adaptation (UDA), a branch of transfer learning where labels for target samples are unavailable, has been widely researched and developed in recent years with the help of adversarially trained models.
no code implementations • 22 Oct 2021 • Ziwen Li, Bo Xu, Han Huang, Cheng Lu, Yandong Guo
In this paper, we propose a new framework Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation (DTS-VIBE), to generate 3D human pose and mesh from RGB videos.
Ranked #43 on 3D Human Pose Estimation on 3DPW
1 code implementation • ICCV 2021 • Bo Xu, Han Huang, Cheng Lu, Ziwen Li, Yandong Guo
In this paper, we propose a Virtual Multi-modality Foreground Matting (VMFM) method to learn human-object interactive foreground (human and objects interacted with him or her) from a raw RGB image.
1 code implementation • ICLR 2021 • Cheng Lu, Jianfei Chen, Chongxuan Li, Qiuhao Wang, Jun Zhu
Through theoretical analysis, we show that the function space of ImpFlow is strictly richer than that of ResFlows.
no code implementations • 13 Aug 2020 • Xingxun Jiang, Yuan Zong, Wenming Zheng, Chuangao Tang, Wanchuang Xia, Cheng Lu, Jiateng Liu
Experimental results show that DFEW is a well-designed and challenging database, and the proposed EC-STFL can promisingly improve the performance of existing spatiotemporal deep neural networks in coping with the problem of dynamic FER in the wild.
Ranked #17 on Dynamic Facial Expression Recognition on DFEW
Dynamic Facial Expression Recognition Facial Expression Recognition +1
2 code implementations • CVPR 2020 • Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang
Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates.
Ranked #6 on Audio-Visual Speech Recognition on LRS3-TED (using extra training data)
no code implementations • 7 Apr 2020 • Zhecan Wang, Jian Zhao, Cheng Lu, Han Huang, Fan Yang, Lianji Li, Yandong Guo
To better demonstrate the advantage of our methods, we further propose a new benchmark dataset with the most rich distribution of head-gaze combination reflecting real-world scenarios.
1 code implementation • ICML 2020 • Jianfei Chen, Cheng Lu, Biqi Chenli, Jun Zhu, Tian Tian
Generative flows are promising tractable models for density modeling that define probabilistic distributions with invertible transformations.
Ranked #30 on Image Generation on CIFAR-10 (bits/dimension metric)
1 code implementation • 8 Dec 2019 • Fan Yang, Cheng Lu, Yandong Guo, Longin Jan Latecki, Haibin Ling
Feature pyramid architecture has been broadly adopted in object detection and segmentation to deal with multi-scale problem.
1 code implementation • NeurIPS 2019 • Andrey Kolobov, Yuval Peres, Cheng Lu, Eric J. Horvitz
From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e. g., web pages).
no code implementations • CVPR 2017 • Yandong Guo, Cheng Lu, Jan P. Allebach, Charles A. Bouman
Experimental results with a variety of document images demonstrate that our method improves the image quality compared with the observed image, and simultaneously improves the compression ratio.