Search Results for author: Saurabh Agarwal

Found 15 papers, 7 papers with code

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

no code implementations • 25 Apr 2024 • Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu

We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs).

Continual Pretraining Semantic Parsing

Paper
Add Code

CHAI: Clustered Head Attention for Efficient LLM Inference

no code implementations • 12 Mar 2024 • Saurabh Agarwal, Bilge Acun, Basil Hosmer, Mostafa Elhoushi, Yejin Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu

We observe that there is a high amount of redundancy across heads on which tokens they pay attention to.

Paper
Add Code

Decoding Speculative Decoding

1 code implementation • 2 Feb 2024 • Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality.

Language Modelling

Paper
Code

MultiFusionNet: Multilayer Multimodal Fusion of Deep Neural Networks for Chest X-Ray Image Classification

no code implementations • 1 Jan 2024 • Saurabh Agarwal, K. V. Arya, Yogesh Kumar Meena

The proposed multilayer multimodal fusion model, along with the FDSFM module, holds promise for accurate disease classification and can also be extended to other disease classifications in chest X-ray images.

Image Classification

Paper
Add Code

Cuttlefish: Low-Rank Model Training without All the Tuning

1 code implementation • 4 May 2023 • Hongyi Wang, Saurabh Agarwal, Pongsakorn U-chupala, Yoshiki Tanaka, Eric P. Xing, Dimitris Papailiopoulos

Cuttlefish leverages the observation that after a few epochs of full-rank training, the stable rank (i. e., an approximation of the true rank) of each layer stabilizes at a constant value.

Paper
Code

BagPipe: Accelerating Deep Recommendation Model Training

no code implementations • 24 Feb 2022 • Saurabh Agarwal, Chengpo Yan, Ziyi Zhang, Shivaram Venkataraman

Based on these insights, we develop Bagpipe, a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation.

Paper
Add Code

Pufferfish: Communication-efficient Models At No Extra Cost

1 code implementation • 5 Mar 2021 • Hongyi Wang, Saurabh Agarwal, Dimitris Papailiopoulos

In this work, we present Pufferfish, a communication and computation efficient distributed training framework that incorporates the gradient compression into the model training process via training low-rank, pre-factorized deep networks.

Quantization

Paper
Code

On the Utility of Gradient Compression in Distributed Training Systems

1 code implementation • 28 Feb 2021 • Saurabh Agarwal, Hongyi Wang, Shivaram Venkataraman, Dimitris Papailiopoulos

A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training.

Model Compression

Paper
Code

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

1 code implementation • 2 Feb 2021 • YuHan Liu, Saurabh Agarwal, Shivaram Venkataraman

With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine tuning models which were pre-trained on a large corpus of data.

Paper
Code

Regularized Graph Convolutional Networks for Short Text Classification

no code implementations • COLING 2020 • Kshitij Tayal, Nikhil Rao, Saurabh Agarwal, Xiaowei Jia, Karthik Subbian, Vipin Kumar

The lack of structure in short text sequences limits the success of popular NLP methods based on deep learning.

text-classification Text Classification

Paper
Add Code

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

3 code implementations • 29 Oct 2020 • Saurabh Agarwal, Hongyi Wang, Kangwook Lee, Shivaram Venkataraman, Dimitris Papailiopoulos

The techniques usually require choosing a static compression ratio, often requiring users to balance the trade-off between model accuracy and per-iteration speedup.

Quantization

135

Paper
Code

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

2 code implementations • NeurIPS 2020 • Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, Dimitris Papailiopoulos

Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training.

Fairness Federated Learning +4

Paper
Code

Scalable K-Medoids via True Error Bound and Familywise Bandits

no code implementations • 27 May 2019 • Aravindakshan Babu, Saurabh Agarwal, Sudarshan Babu, Hariharan Chandrasekaran

K-Medoids(KM) is a standard clustering method, used extensively on semi-metric data. Error analyses of KM have traditionally used an in-sample notion of error, which can be far from the true error and suffer from generalization gap.

Clustering

Paper
Add Code

Graph based Question Answering System

no code implementations • 5 Dec 2018 • Piyush Mital, Saurabh Agarwal, Bhargavi Neti, Yashodhara Haribhakta, Vibhavari Kamble, Krishnanjan Bhattacharjee, Debashri Das, Swati Mehta, Ajai Kumar

In today's digital age in the dawning era of big data analytics it is not the information but the linking of information through entities and actions which defines the discourse.

Question Answering Retrieval

Paper
Add Code

A Novel Approach to Develop a New Hybrid Technique for Trademark Image Retrieval

no code implementations • 8 Nov 2014 • Saurabh Agarwal, Punit Kumar Johari

Trademark Image Retrieval is playing a vital role as a part of CBIR System.

Image Retrieval Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.