no code implementations • NeurIPS 2023 • Lili Yu, Dániel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, Mike Lewis
Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books.
1 code implementation • 16 Mar 2023 • Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos
Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.