no code implementations • 28 Nov 2023 • Peidong Jia, Chenxuan Li, Yuhui Yuan, Zeyu Liu, Yichao Shen, Bohan Chen, Xingru Chen, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining Guo
Our COLE system comprises multiple fine-tuned Large Language Models (LLMs), Large Multimodal Models (LMMs), and Diffusion Models (DMs), each specifically tailored for design-aware layer-wise captioning, layout planning, reasoning, and the task of generating images and text.
1 code implementation • 22 May 2023 • Renshuai Liu, Chengyang Li, Haitao Cao, Yinglin Zheng, Ming Zeng, Xuan Cheng
In the second stage, we tune the imitator network by optimizing the style code, in order to find an optimal fusion result for each input pair.
no code implementations • 24 Mar 2023 • PengFei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng
In this paper, we present a method for this task with natural motions of the lip, facial expression, head pose, and eye states.
no code implementations • CVPR 2023 • Xiaoyi Dong, Jianmin Bao, Yinglin Zheng, Ting Zhang, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu
Second, masked self-distillation is also consistent with vision-language contrastive from the perspective of training objective as both utilize the visual encoder for feature aligning, and thus is able to learn local semantics getting indirect supervision from the language.
1 code implementation • 22 Jun 2022 • Yiwei Ding, Wenjin Deng, Yinglin Zheng, PengFei Liu, Meihong Wang, Xuan Cheng, Jianmin Bao, Dong Chen, Ming Zeng
In this paper, we present the Intra- and Inter-Human Relation Networks (I^2R-Net) for Multi-Person Pose Estimation.
Ranked #2 on Multi-Person Pose Estimation on OCHuman
2 code implementations • CVPR 2022 • Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen
In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.
Ranked #1 on Face Parsing on CelebAMask-HQ (using extra training data)
1 code implementation • ICCV 2021 • Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, Fang Wen
The first stage is a fully temporal convolution network (FTCN).
Ranked #4 on DeepFake Detection on FakeAVCeleb