2 code implementations • 7 Mar 2024 • Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj
Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive.
no code implementations • 14 Dec 2023 • Kai Qiu, Huishuai Zhang, Zhirong Wu, Stephen Lin
However, the model robustness, which is a critical aspect for safety, is often optimized for each specific task rather than at the pretraining stage.
no code implementations • 30 Nov 2023 • Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong
Second, it preserves the high-fidelity generation ability of the pre-trained image diffusion models by making only minimal network modifications.
no code implementations • 30 Nov 2023 • Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo
We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation.
no code implementations • 22 Nov 2022 • Zhongwei Qiu, Kai Qiu, Jianlong Fu, Dongmei Fu
Based on MCPC, we propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image.
no code implementations • ICCV 2019 • Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai
Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.