1 code implementation • CVPR 2023 • Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman
Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person.
no code implementations • 17 Apr 2023 • Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, David J. Fleet
Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts.
no code implementations • 15 Feb 2023 • Hshmat Sahak, Daniel Watson, Chitwan Saharia, David Fleet
Diffusion models have shown promising results on single-image super-resolution and other image- to-image translation tasks.
1 code implementation • 20 Dec 2022 • Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant
In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell).
no code implementations • CVPR 2023 • Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan
Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
no code implementations • 5 Oct 2022 • Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans
We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models.
Ranked #1 on Video Generation on LAION-400M
no code implementations • 29 Sep 2022 • Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen
To further evaluate the capabilities of the model, we introduce EntityDrawBench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple object categories including dogs, foods, landmarks, birds, and characters.
Ranked #3 on Text-to-Image Generation on MS COCO
4 code implementations • 23 May 2022 • Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi
We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.
Ranked #17 on Text-to-Image Generation on MS COCO (using extra training data)
no code implementations • CVPR 2022 • Jay Whang, Mauricio Delbracio, Hossein Talebi, Chitwan Saharia, Alexandros G. Dimakis, Peyman Milanfar
Unlike existing techniques, we train a stochastic sampler that refines the output of a deterministic predictor and is capable of producing a diverse set of plausible reconstructions for a given input.
5 code implementations • 10 Nov 2021 • Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi
We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research.
Ranked #1 on Colorization on ImageNet ctest10k
no code implementations • 30 May 2021 • Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality.
Ranked #5 on Image Generation on ImageNet 64x64
4 code implementations • 15 Apr 2021 • Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi
We present SR3, an approach to image Super-Resolution via Repeated Refinement.
2 code implementations • EMNLP 2020 • Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi
In addition, we adapt the Imputer model for non-autoregressive machine translation and demonstrate that Imputer with just 4 generation steps can match the performance of an autoregressive Transformer baseline.
1 code implementation • ICML 2020 • William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly
This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations.
no code implementations • 2 Feb 2020 • Konrad Zolna, Chitwan Saharia, Leonard Boussioux, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio
In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior.
6 code implementations • ICLR 2019 • Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio
Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts.