Search Results for author: Max Marion

Found 2 papers, 0 papers with code

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

no code implementations30 May 2024 Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, Mansheej Paul

In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models.

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

no code implementations8 Sep 2023 Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker

In this work, we take a wider view and explore scalable estimates of data quality that can be used to systematically measure the quality of pretraining data.

Memorization

Cannot find the paper you are looking for? You can Submit a new open access paper.