Search Results for author: Kai Qiu

Found 6 papers, 1 papers with code

$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

2 code implementations • 7 Mar 2024 • Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj

Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive.

Benchmarking

Paper
Code

Exploring Transferability for Randomized Smoothing

no code implementations • 14 Dec 2023 • Kai Qiu, Huishuai Zhang, Zhirong Wu, Stephen Lin

However, the model robustness, which is a critical aspect for safety, is often optimized for each specific task rather than at the pretraining stage.

Paper
Add Code

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

no code implementations • 30 Nov 2023 • Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

Second, it preserves the high-fidelity generation ability of the pre-trained image diffusion models by making only minimal network modifications.

Text-to-Video Generation Video Generation

Paper
Add Code

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

no code implementations • 30 Nov 2023 • Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation.

Text-to-Image Generation Text-to-Video Generation +1

Paper
Add Code

Weakly-supervised Pre-training for 3D Human Pose Estimation via Perspective Knowledge

no code implementations • 22 Nov 2022 • Zhongwei Qiu, Kai Qiu, Jianlong Fu, Dongmei Fu

Based on MCPC, we propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image.

3D Human Pose Estimation 3D Pose Estimation

Paper
Add Code

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

no code implementations • ICCV 2019 • Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai

Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.

Crowd Counting Density Estimation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.