no code implementations • 4 Oct 2023 • Jared Lichtarge, Ehsan Amid, Shankar Kumar, Tien-Ju Yang, Rohan Anil, Rajiv Mathews
Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture.
no code implementations • 10 Sep 2022 • Jared Lichtarge, Chris Alberti, Shankar Kumar
For T5, we show that learning hyper-parameters during pretraining can improve performance across downstream NLU tasks.
no code implementations • 7 Aug 2020 • Jared Lichtarge, Chris Alberti, Shankar Kumar
Recent progress in the task of Grammatical Error Correction (GEC) has been driven by addressing data sparsity, both through new methods for generating large and noisy pretraining data and through the publication of small and higher-quality finetuning data in the BEA-2019 shared task.
no code implementations • NAACL 2019 • Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, Simon Tong
We provide systematic analysis that compares the two approaches to data generation and highlights the effectiveness of ensembling.
no code implementations • 31 Oct 2018 • Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar
We describe an approach to Grammatical Error Correction (GEC) that is effective at making use of models trained on large amounts of weakly supervised bitext.