no code implementations • 4 Apr 2024 • Zixuan Huang, Justin Johnson, Shoubhik Debnath, James M. Rehg, Chao-yuan Wu
We present PointInfinity, an efficient family of point cloud diffusion models.
4 code implementations • CVPR 2022 • Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik
Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.
1 code implementation • CVPR 2023 • Chao-yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, Georgia Gkioxari
We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.
1 code implementation • CVPR 2022 • Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer
Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.
Ranked #3 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)
45 code implementations • CVPR 2022 • Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
Ranked #1 on Classification on InDL
5 code implementations • CVPR 2022 • Chen Wei, Haoqi Fan, Saining Xie, Chao-yuan Wu, Alan Yuille, Christoph Feichtenhofer
We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models.
Ranked #8 on Action Recognition on AVA v2.2 (using extra training data)
7 code implementations • CVPR 2022 • Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer
In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.
Ranked #1 on Action Classification on Kinetics-600 (GFLOPs metric)
2 code implementations • CVPR 2021 • Chao-yuan Wu, Philipp Krähenbühl
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds.
Ranked #26 on Action Recognition on AVA v2.2
1 code implementation • ICLR 2021 • Aashaka Shah, Chao-yuan Wu, Jayashree Mohan, Vijay Chidambaram, Philipp Krähenbühl
Deep learning is slowly, but steadily, hitting a memory bottleneck.
1 code implementation • 6 Apr 2020 • Sheng Cao, Chao-yuan Wu, Philipp Krähenbühl
We introduce a simple and efficient lossless image compression algorithm.
3 code implementations • CVPR 2020 • Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl
We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).
Ranked #1 on Video Classification on Kinetics
no code implementations • ICCV 2019 • Wei-Lin Hsiao, Isay Katsman, Chao-yuan Wu, Devi Parikh, Kristen Grauman
We introduce Fashion++, an approach that proposes minimal adjustments to a full-body clothing outfit that will have maximal impact on its fashionability.
4 code implementations • CVPR 2019 • Chao-yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
Ranked #4 on Egocentric Activity Recognition on EPIC-KITCHENS-55
1 code implementation • ECCV 2018 • Chao-yuan Wu, Nayan Singhal, Philipp Krähenbühl
An ever increasing amount of our digital communication, media consumption, and content creation revolves around videos.
1 code implementation • CVPR 2018 • Chao-yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
), we propose to train a deep network directly on the compressed video.
Ranked #46 on Action Classification on Charades (using extra training data)
no code implementations • ICML 2017 • Qi Lei, Ian En-Hsu Yen, Chao-yuan Wu, Inderjit S. Dhillon, Pradeep Ravikumar
We consider the popular problem of sparse empirical risk minimization with linear predictors and a large number of both features and observations.
6 code implementations • ICCV 2017 • Chao-yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions.
Ranked #5 on Image Retrieval on CARS196
no code implementations • 31 Mar 2017 • Hsiao-Yu Fish Tung, Chao-yuan Wu, Manzil Zaheer, Alexander J. Smola
Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models.
no code implementations • WSDM 2017 • Chao-yuan Wu, Amr Ahmed, Alex Beutel, Alexander J. Smola, How Jing
Recommender systems traditionally assume that user profiles and movie attributes are static.
no code implementations • 6 Dec 2015 • Chao-yuan Wu, Alex Beutel, Amr Ahmed, Alexander J. Smola
With this novel technique we propose a new Bayesian model for joint collaborative filtering of ratings and text reviews through a sum of simple co-clusterings.