TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Generation	UCF-101	PYoCo (Zero-shot, 64x64, unconditional)	Inception Score	60.01	# 9
Video Generation	UCF-101	PYoCo (Zero-shot, 64x64, unconditional)	FVD16	310	# 13
Video Generation	UCF-101	PYoCo (Zero-shot, 64x64, text-conditional)	Inception Score	47.76	# 15
Video Generation	UCF-101	PYoCo (Zero-shot, 64x64, text-conditional)	FVD16	355.19	# 19
Text-to-Video Generation	UCF-101	PYoCo (Zero-shot, 64x64)	FVD16	355.19	# 8

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/preserve-your-own-correlation-a-noise-prior/text-to-video-generation-on-ucf-101)](https://paperswithcode.com/sota/text-to-video-generation-on-ucf-101?p=preserve-your-own-correlation-a-noise-prior)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/preserve-your-own-correlation-a-noise-prior/video-generation-on-ucf-101)](https://paperswithcode.com/sota/video-generation-on-ucf-101?p=preserve-your-own-correlation-a-noise-prior)`

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

ICCV 2023 · Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, Yogesh Balaji ·

Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy. While off-the-shelf billion-scale datasets for image generation are available, collecting similar video data of the same scale is still challenging. Also, training a video diffusion model is computationally much more expensive than its image counterpart. In this work, we explore finetuning a pretrained image diffusion model with video data as a practical solution for the video synthesis task. We find that naively extending the image noise prior to video noise prior in video diffusion leads to sub-optimal performance. Our carefully designed video noise prior leads to substantially better performance. Extensive experimental validation shows that our model, Preserve Your Own Correlation (PYoCo), attains SOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks. It also achieves SOTA video generation quality on the small-scale UCF-101 benchmark with a $10\times$ smaller model using significantly less computation than the prior art.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Image Generation

Text-to-Video Generation

Video Generation

Datasets

UCF101

MSR-VTT

Results from the Paper

Add Remove

Ranked #8 on Text-to-Video Generation on UCF-101

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Generation	UCF-101	PYoCo (Zero-shot, 64x64, unconditional)	Inception Score	60.01	# 9	Compare
Video Generation	UCF-101	PYoCo (Zero-shot, 64x64, unconditional)	FVD16	310	# 13	Compare
Video Generation	UCF-101	PYoCo (Zero-shot, 64x64, text-conditional)	Inception Score	47.76	# 15	Compare
Video Generation	UCF-101	PYoCo (Zero-shot, 64x64, text-conditional)	FVD16	355.19	# 19	Compare
Text-to-Video Generation	UCF-101	PYoCo (Zero-shot, 64x64)	FVD16	355.19	# 8	Compare

Methods

Add Remove

Diffusion

Edit Social Preview

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove