no code implementations • 20 Feb 2024 • Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan
MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time.
no code implementations • 31 Dec 2023 • Peihao Wang, Dejia Xu, Zhiwen Fan, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra
In this paper, we reveal that the existing score distillation-based text-to-3D generation frameworks degenerate to maximal likelihood seeking on each view independently and thus suffer from the mode collapse problem, manifesting as the Janus artifact in practice.
no code implementations • 31 Dec 2023 • Peihao Wang, Zhiwen Fan, Dejia Xu, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra
In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance.
1 code implementation • 1 Dec 2023 • Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra
On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e. g., ~4 AP on COCO/LVIS) over other fast SAM models.
Ranked #3 on Zero-Shot Instance Segmentation on LVIS v1.0 val
no code implementations • 5 Oct 2023 • Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Hanwen Jiang, Dejia Xu, Zehao Zhu, Dilin Wang, Zhangyang Wang
To address this challenge, we introduce PF-GRT, a new Pose-Free framework for Generalizable Rendering Transformer, eliminating the need for pre-computed camera poses and instead leveraging feature-matching learned directly from data.
no code implementations • 5 Sep 2023 • Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra
Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Jun 2023 • Ganesh Jawahar, Haichuan Yang, Yunyang Xiong, Zechun Liu, Dilin Wang, Fei Sun, Meng Li, Aasish Pappu, Barlas Oguz, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Raghuraman Krishnamoorthi, Vikas Chandra
In addition, the proposed method achieves the SOTA performance in NAS for building fast machine translation models, yielding better latency-BLEU tradeoff compared to HAT, state-of-the-art NAS for MT.
no code implementations • 12 Dec 2022 • Lemeng Wu, Dilin Wang, Meng Li, Yunyang Xiong, Raghuraman Krishnamoorthi, Qiang Liu, Vikas Chandra
Fusing 3D LiDAR features with 2D camera features is a promising technique for enhancing the accuracy of 3D detection, thanks to their complementary physical properties.
1 code implementation • CVPR 2023 • Lemeng Wu, Dilin Wang, Chengyue Gong, Xingchao Liu, Yunyang Xiong, Rakesh Ranjan, Raghuraman Krishnamoorthi, Vikas Chandra, Qiang Liu
We perform evaluations on multiple 3D tasks and find that our PSF performs comparably to the standard diffusion model, outperforming other efficient 3D point cloud generation methods.
no code implementations • 17 Nov 2021 • Zhaoshuo Li, Wei Ye, Dilin Wang, Francis X. Creighton, Russell H. Taylor, Ganesh Venkatesh, Mathias Unberath
We present a framework named Consistent Online Dynamic Depth (CODD) to produce temporally consistent depth estimates in dynamic scenes in an online setting.
1 code implementation • CVPR 2022 • Jiaqi Gu, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, David Z. Pan
Therefore, we propose HRViT, which enhances ViTs to learn semantically-rich and spatially-precise multi-scale representations by integrating high-resolution multi-branch architectures with ViTs.
Ranked #25 on Semantic Segmentation on Cityscapes val
no code implementations • 15 Oct 2021 • Haichuan Yang, Yuan Shangguan, Dilin Wang, Meng Li, Pierce Chuang, Xiaohui Zhang, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra
From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 7 Oct 2021 • Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, Mike Seltzer
This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution.
1 code implementation • ICLR 2022 • Chengyue Gong, Dilin Wang, Meng Li, Xinlei Chen, Zhicheng Yan, Yuandong Tian, Qiang Liu, Vikas Chandra
In this work, we observe that the poor performance is due to a gradient conflict issue: the gradients of different sub-networks conflict with that of the supernet more severely in ViTs than CNNs, which leads to early saturation in training and inferior convergence.
Ranked #7 on Neural Architecture Search on ImageNet
no code implementations • 9 Jul 2021 • Dilin Wang, Yuan Shangguan, Haichuan Yang, Pierce Chuang, Jiatong Zhou, Meng Li, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra
We apply noisy training to improve both dense and sparse state-of-the-art Emformer models and observe consistent WER reduction.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 26 Apr 2021 • Chengyue Gong, Dilin Wang, Meng Li, Vikas Chandra, Qiang Liu
To alleviate this problem, in this work, we introduce novel loss functions in vision transformer training to explicitly encourage diversity across patch representations for more discriminative feature extraction.
Ranked #20 on Semantic Segmentation on Cityscapes val
2 code implementations • 16 Feb 2021 • Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, Vikas Chandra
Weight-sharing NAS builds a supernet that assembles all the architectures as its sub-networks and jointly trains the supernet with the sub-networks.
Ranked #12 on Neural Architecture Search on ImageNet
no code implementations • CVPR 2021 • Chengyue Gong, Dilin Wang, Qiang Liu
Semi-supervised learning (SSL) is a key approach toward more data-efficient machine learning by jointly leverage both labeled and unlabeled data.
1 code implementation • CVPR 2021 • Chengyue Gong, Dilin Wang, Meng Li, Vikas Chandra, Qiang Liu
Data augmentation (DA) is an essential technique for training state-of-the-art deep learning systems.
2 code implementations • CVPR 2021 • Dilin Wang, Meng Li, Chengyue Gong, Vikas Chandra
Our discovered model family, AttentiveNAS models, achieves top-1 accuracy from 77. 3% to 80. 7% on ImageNet, and outperforms SOTA models, including BigNAS and Once-for-All networks.
Ranked #21 on Neural Architecture Search on ImageNet
1 code implementation • NeurIPS 2019 • Dilin Wang, Ziyang Tang, Chandrajit Bajaj, Qiang Liu
Stein variational gradient descent (SVGD) is a particle-based inference algorithm that leverages gradient information for efficient approximate inference.
1 code implementation • ICLR 2020 • Dilin Wang, Meng Li, Lemeng Wu, Vikas Chandra, Qiang Liu
Designing energy-efficient networks is of critical importance for enabling state-of-the-art deep learning in mobile and edge settings where the computation and energy budgets are highly limited.
1 code implementation • NeurIPS 2019 • Qiang Liu, Lemeng Wu, Dilin Wang
We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs.
1 code implementation • 10 Jun 2019 • Dilin Wang, Chengyue Gong, Qiang Liu
Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models.
Ranked #5 on Language Modelling on Penn Treebank (Word Level)
1 code implementation • NeurIPS 2018 • Dilin Wang, Hao liu, Qiang Liu
Variational inference with {\alpha}-divergences has been widely used in modern probabilistic machine learning.
no code implementations • NeurIPS 2018 • Qiang Liu, Dilin Wang
Stein variational gradient descent (SVGD) is a non-parametric inference algorithm that evolves a set of particles to fit a given distribution of interest.
no code implementations • ICML 2018 • Dilin Wang, Zhe Zeng, Qiang Liu
We propose a novel distributed inference algorithm for continuous graphical models, by extending Stein variational gradient descent (SVGD) to leverage the Markov dependency structure of the distribution of interest.
no code implementations • 20 Jul 2017 • Yihao Feng, Dilin Wang, Qiang Liu
We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference.
no code implementations • 4 Jul 2017 • Qiang Liu, Dilin Wang
We propose a number of new algorithms for learning deep energy models and demonstrate their properties.
1 code implementation • 6 Nov 2016 • Dilin Wang, Qiang Liu
We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference.
Ranked #19 on Conditional Image Generation on CIFAR-10 (Inception score metric)
13 code implementations • NeurIPS 2016 • Qiang Liu, Dilin Wang
We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization.