Search Results for author: Max Marion

Found 2 papers, 0 papers with code

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

no code implementations • 30 May 2024 • Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, Mansheej Paul

In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models.

Paper
Add Code

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

no code implementations • 8 Sep 2023 • Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker

In this work, we take a wider view and explore scalable estimates of data quality that can be used to systematically measure the quality of pretraining data.

Memorization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.