1 code implementation • CVPR 2023 • Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra
We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together.
Ranked #2 on Zero-shot Classification (unified classes) on LLVIP
1 code implementation • ICCV 2023 • Mannat Singh, Quentin Duval, Kalyan Vasudev Alwala, Haoqi Fan, Vaibhav Aggarwal, Aaron Adcock, Armand Joulin, Piotr Dollár, Christoph Feichtenhofer, Ross Girshick, Rohit Girdhar, Ishan Misra
While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well.
Ranked #1 on Few-Shot Image Classification on ImageNet - 10-shot (using extra training data)
1 code implementation • CVPR 2023 • Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra
Furthermore, this model can be learned by dropping 90% of the image and 95% of the video patches, enabling extremely fast training of huge model architectures.
no code implementations • CVPR 2022 • Kalyan Vasudev Alwala, Abhinav Gupta, Shubham Tulsiani
Our final 3D reconstruction model is also capable of zero-shot inference on images from unseen object categories and we empirically show that increasing the number of training categories improves the reconstruction quality.
1 code implementation • 18 Nov 2021 • Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer
We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.
no code implementations • ICCV 2021 • Taosha Fan, Kalyan Vasudev Alwala, Donglai Xiang, Weipeng Xu, Todd Murphey, Mustafa Mukadam
We propose a novel sparse constrained formulation and from it derive a real-time optimization method for 3D human pose and shape estimation.
2 code implementations • 19 Jun 2019 • Adithyavairavan Murali, Tao Chen, Kalyan Vasudev Alwala, Dhiraj Gandhi, Lerrel Pinto, Saurabh Gupta, Abhinav Gupta
This paper introduces PyRobot, an open-source robotics framework for research and benchmarking.