Search Results for author: Kevin Leong

Found 3 papers, 0 papers with code

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

no code implementations6 May 2024 Abhinav Agarwalla, Abhay Gupta, Alexandre Marques, Shubhra Pandit, Michael Goin, Eldar Kurtic, Kevin Leong, Tuan Nguyen, Mahmoud Salem, Dan Alistarh, Sean Lie, Mark Kurtz

We achieve this for the LLaMA-2 7B model by combining the SparseGPT one-shot pruning method and sparse pretraining of those models on a subset of the SlimPajama dataset mixed with a Python subset of The Stack dataset.

Arithmetic Reasoning Code Generation +2

MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

no code implementations1 Mar 2024 Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie

Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e. g., biomedicine).

Question Answering

SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

no code implementations18 Mar 2023 Vithursan Thangarasa, Abhay Gupta, William Marshall, Tianda Li, Kevin Leong, Dennis Decoste, Sean Lie, Shreyas Saxena

In this work, we show the benefits of using unstructured weight sparsity to train only a subset of weights during pre-training (Sparse Pre-training) and then recover the representational capacity by allowing the zeroed weights to learn (Dense Fine-tuning).

Text Generation Text Summarization

Cannot find the paper you are looking for? You can Submit a new open access paper.