Search Results for author: Eldar Kurtic

Found 13 papers, 7 papers with code

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

no code implementations • 6 May 2024 • Abhinav Agarwalla, Abhay Gupta, Alexandre Marques, Shubhra Pandit, Michael Goin, Eldar Kurtic, Kevin Leong, Tuan Nguyen, Mahmoud Salem, Dan Alistarh, Sean Lie, Mark Kurtz

We achieve this for the LLaMA-2 7B model by combining the SparseGPT one-shot pruning method and sparse pretraining of those models on a subset of the SlimPajama dataset mixed with a Python subset of The Stack dataset.

Arithmetic Reasoning Code Generation +2

Paper
Add Code

How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark

no code implementations • 21 Dec 2023 • Eldar Kurtic, Torsten Hoefler, Dan Alistarh

Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task.

Knowledge Distillation Language Modelling

Paper
Add Code

Sparse Fine-tuning for Inference Acceleration of Large Language Models

2 code implementations • 10 Oct 2023 • Eldar Kurtic, Denis Kuznedelev, Elias Frantar, Michael Goin, Dan Alistarh

While the standard approach is to leverage sparsity for computational reduction, we observe that in the case of memory-bound LLMs sparsity can also be leveraged for reducing memory bandwidth.

Quantization Text Generation +1

2,888

Paper
Code

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

no code implementations • 3 Aug 2023 • Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh

Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community.

Model Compression Network Pruning +1

Paper
Add Code

Error Feedback Can Accurately Compress Preconditioners

1 code implementation • 9 Jun 2023 • Ionut-Vlad Modoranu, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, Dan Alistarh

Experiments on deep neural networks show that this approach can compress full-matrix preconditioners to up to 99\% sparsity without accuracy loss, effectively removing the memory overhead of full-matrix preconditioners such as GGT and M-FAC.

Classification Second-order methods

Paper
Code

Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

no code implementations • 25 Mar 2023 • Denis Kuznedelev, Soroush Tabesh, Kimia Noorbakhsh, Elias Frantar, Sara Beery, Eldar Kurtic, Dan Alistarh

To address this, we ask: can we quickly compress large generalist models into accurate and efficient specialists?

Paper
Add Code

SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks

1 code implementation • 9 Feb 2023 • Mahdi Nikdan, Tommaso Pegolotti, Eugenia Iofinova, Eldar Kurtic, Dan Alistarh

We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse.

Transfer Learning

Paper
Code

ZipLM: Inference-Aware Structured Pruning of Language Models

1 code implementation • NeurIPS 2023 • Eldar Kurtic, Elias Frantar, Dan Alistarh

Furthermore, ZipLM achieves superior results for a fraction of the computational cost relative to prior distillation and pruning techniques, making it a cost-effective approach for generating an entire family of smaller, faster, and highly accurate models, guaranteed to meet the desired inference specifications.

Paper
Code

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

no code implementations • NeurIPS 2023 • Denis Kuznedelev, Eldar Kurtic, Elias Frantar, Dan Alistarh

To further showcase CAP's accuracy and scalability, we use it to show for the first time that extremely-accurate large vision models, trained via self-supervised techniques, can also be pruned to moderate sparsities, with negligible accuracy loss.

Image Classification Quantization

Paper
Add Code

GMP*: Well-Tuned Gradual Magnitude Pruning Can Outperform Most BERT-Pruning Methods

no code implementations • 12 Oct 2022 • Eldar Kurtic, Dan Alistarh

We revisit the performance of the classic gradual magnitude pruning (GMP) baseline for large language models, focusing on the classic BERT benchmark on various popular tasks.

Paper
Add Code

CrAM: A Compression-Aware Minimizer

1 code implementation • 28 Jul 2022 • Alexandra Peste, Adrian Vladu, Eldar Kurtic, Christoph H. Lampert, Dan Alistarh

In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning.

Image Classification Language Modelling +2

Paper
Code

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

2 code implementations • 14 Mar 2022 • Eldar Kurtic, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, Dan Alistarh

We perform an in-depth study of the accuracy-compression trade-off for unstructured weight pruning of BERT models.

Quantization

2,888

Paper
Code

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

2 code implementations • NeurIPS 2021 • Elias Frantar, Eldar Kurtic, Dan Alistarh

We propose two new algorithms as part of a framework called M-FAC: the first algorithm is tailored towards network compression and can compute the IHVP for dimension $d$, if the Hessian is given as a sum of $m$ rank-one matrices, using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP, and query cost $O(m)$ for any single element of the inverse Hessian.

Network Pruning Second-order methods

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.