no code implementations • 13 Dec 2023 • Yuzhe Zhang, Jiawei Zhang, Hao Li, Zhouxia Wang, Luwei Hou, Dongqing Zou, Liheng Bian
Since text prior is important to guarantee the correctness of the restored text structure according to existing arts, we also propose a Text Diffusion Model (TDM) for text recognition which can guide IDM to generate text images with correct structures.
1 code implementation • 6 Dec 2023 • Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan
Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion.
no code implementations • 4 Sep 2023 • Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo
StyleAdapter can generate high-quality images that match the content of the prompts and adopt the style of the references (even for unseen styles) in a single pass, which is more flexible and efficient than previous methods.
1 code implementation • 14 Aug 2023 • Zhouxia Wang, Jiawei Zhang, Tianshui Chen, Wenping Wang, Ping Luo
In this work, we propose RestoreFormer++, which on the one hand introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors, and on the other hand, explores an extending degrading model to help generate more realistic degraded face images to alleviate the synthetic-to-real-world gap.
1 code implementation • CVPR 2022 • Zhouxia Wang, Jiawei Zhang, Runjian Chen, Wenping Wang, Ping Luo
Blind face restoration is to recover a high-quality face image from unknown degradations.
1 code implementation • CVPR 2020 • Zhouxia Wang, Jiawei Zhang, Mude Lin, Jiong Wang, Ping Luo, Jimmy Ren
Automatically selecting exposure bracketing (images exposed differently) is important to obtain a high dynamic range image by using multi-exposure fusion.
1 code implementation • 2 Jul 2018 • Zhouxia Wang, Tianshui Chen, Jimmy Ren, Weihao Yu, Hui Cheng, Liang Lin
And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects.
Ranked #2 on Visual Social Relationship Recognition on PIPA
no code implementations • 20 Dec 2017 • Tianshui Chen, Zhouxia Wang, Guanbin Li, Liang Lin
Recognizing multiple labels of images is a fundamental but challenging task in computer vision, and remarkable progress has been attained by localizing semantic-aware image regions and predicting their labels with deep convolutional neural networks.
1 code implementation • CVPR 2018 • Yue Luo, Jimmy Ren, Zhouxia Wang, Wenxiu Sun, Jinshan Pan, Jianbo Liu, Jiahao Pang, Liang Lin
Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e. g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames.
Ranked #3 on Pose Estimation on J-HMDB
no code implementations • ICCV 2017 • Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin
This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding.