no code implementations • 20 Dec 2023 • Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet
In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets.
no code implementations • 7 Dec 2023 • Ethan Weber, Aleksander Hołyński, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa
In contrast to related works, we focus on completing scenes rather than deleting foreground objects, and our approach does not require tight 2D object masks or text.
no code implementations • NeurIPS 2023 • Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.
no code implementations • 28 Feb 2023 • Saurabh Saxena, Abhishek Kar, Mohammad Norouzi, David J. Fleet
To cope with the limited availability of data for supervised training, we leverage pre-training on self-supervised image-to-image translation tasks.
Ranked #22 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)
1 code implementation • ICCV 2023 • Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J. Fleet
Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image.
1 code implementation • 15 Jun 2022 • Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey Hinton
Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization.
4 code implementations • 23 May 2022 • Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi
We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.
Ranked #17 on Text-to-Image Generation on MS COCO (using extra training data)
6 code implementations • ICLR 2022 • Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton
We present Pix2Seq, a simple and generic framework for object detection.
Ranked #77 on Object Detection on COCO minival (using extra training data)
no code implementations • 1 Feb 2021 • Darius Roman, Saurabh Saxena, Valentin Robu, Michael Pecht, David Flynn
In this paper, we design and evaluate a machine learning pipeline for estimation of battery capacity fade - a metric of battery health - on 179 cells cycled under various conditions.
2 code implementations • EMNLP 2020 • Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi
In addition, we adapt the Imputer model for non-autoregressive machine translation and demonstrate that Imputer with just 4 generation steps can match the performance of an autoregressive Transformer baseline.
2 code implementations • ICML 2017 • Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc Le, Alex Kurakin
Neural networks have proven effective at solving difficult problems but designing their architectures can be challenging, even for image classification problems alone.
Ranked #117 on Image Classification on CIFAR-10