Search Results for author: Saurabh Saxena

Found 11 papers, 6 papers with code

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

no code implementations • 20 Dec 2023 • Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets.

Denoising Monocular Depth Estimation

Paper
Add Code

NeRFiller: Completing Scenes via Generative 3D Inpainting

no code implementations • 7 Dec 2023 • Ethan Weber, Aleksander Hołyński, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa

In contrast to related works, we focus on completing scenes rather than deleting foreground objects, and our approach does not require tight 2D object masks or text.

3D Inpainting

Paper
Add Code

The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

no code implementations • NeurIPS 2023 • Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet

Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.

Denoising Image Generation +2

Paper
Add Code

Monocular Depth Estimation using Diffusion Models

no code implementations • 28 Feb 2023 • Saurabh Saxena, Abhishek Kar, Mohammad Norouzi, David J. Fleet

To cope with the limited availability of data for supervised training, we leverage pre-training on self-supervised image-to-image translation tasks.

Ranked #22 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Denoising Image-to-Image Translation +3

Paper
Add Code

A Generalist Framework for Panoptic Segmentation of Images and Videos

1 code implementation • ICCV 2023 • Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J. Fleet

Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image.

Inductive Bias Panoptic Segmentation +2

818

Paper
Code

A Unified Sequence Interface for Vision Tasks

1 code implementation • 15 Jun 2022 • Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey Hinton

Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization.

Image Captioning Instance Segmentation +2

818

Paper
Code

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

4 code implementations • 23 May 2022 • Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.

Ranked #17 on Text-to-Image Generation on MS COCO (using extra training data)

7,807

Paper
Code

Pix2seq: A Language Modeling Framework for Object Detection

6 code implementations • ICLR 2022 • Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton

We present Pix2Seq, a simple and generic framework for object detection.

Ranked #77 on Object Detection on COCO minival (using extra training data)

Language Modelling Object +2

818

Paper
Code

Machine learning pipeline for battery state of health estimation

no code implementations • 1 Feb 2021 • Darius Roman, Saurabh Saxena, Valentin Robu, Michael Pecht, David Flynn

In this paper, we design and evaluate a machine learning pipeline for estimation of battery capacity fade - a metric of battery health - on 179 cells cycled under various conditions.

BIG-bench Machine Learning feature selection

Paper
Add Code

Non-Autoregressive Machine Translation with Latent Alignments

2 code implementations • EMNLP 2020 • Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi

In addition, we adapt the Imputer model for non-autoregressive machine translation and demonstrate that Imputer with just 4 generation steps can match the performance of an autoregressive Transformer baseline.

Machine Translation Translation

Paper
Code

Large-Scale Evolution of Image Classifiers

2 code implementations • ICML 2017 • Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc Le, Alex Kurakin

Neural networks have proven effective at solving difficult problems but designing their architectures can be challenging, even for image classification problems alone.

Ranked #117 on Image Classification on CIFAR-10

Evolutionary Algorithms Hyperparameter Optimization +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.