no code implementations • ICML 2020 • Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna
Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.
1 code implementation • 22 Feb 2024 • Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener
The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.
1 code implementation • 1 Feb 2024 • Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach
Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context.
no code implementations • 27 Jun 2023 • Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton
We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums.
no code implementations • 13 May 2023 • Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar
In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.
no code implementations • 13 Oct 2022 • Samy Jelassi, Michael E. Sander, Yuanzhi Li
On the theoretical side, we consider a binary classification task and show that while the learning problem admits multiple solutions that generalize, our model implicitly learns the spatial structure of the dataset while generalizing: we call this phenomenon patch association.
no code implementations • 9 Oct 2022 • Samy Jelassi, David Dobre, Arthur Mensch, Yuanzhi Li, Gauthier Gidel
By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training.
no code implementations • 13 Jul 2022 • Samy Jelassi, Yuanzhi Li
Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures.
no code implementations • 29 Sep 2021 • Samy Jelassi, Arthur Mensch, Gauthier Gidel, Yuanzhi Li
We empirically show that SGDA with the same vector norm as Adam reaches similar or even better performance than the latter.
no code implementations • 2 Feb 2021 • Luca Venturi, Samy Jelassi, Tristan Ozuch, Joan Bruna
The first contribution of this paper is to extend such results to a more general class of functions, namely functions with piece-wise oscillatory structure, by building on the proof strategy of (Eldan and Shamir, 2016).
5 code implementations • 26 Jan 2021 • Aaron Defazio, Samy Jelassi
We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods.
no code implementations • 20 Oct 2020 • Samy Jelassi, Aaron Defazio
First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks.
no code implementations • ICLR 2021 • Jad Rahme, Samy Jelassi, S. Matthew Weinberg
This not only circumvents the need for an expensive hyper-parameter search (as in prior work), but also provides a principled metric to compare the performance of two auctions (absent from prior work).
1 code implementation • 2 Mar 2020 • Jad Rahme, Samy Jelassi, Joan Bruna, S. Matthew Weinberg
Designing an incentive compatible auction that maximizes expected revenue is a central problem in Auction Design.
no code implementations • NeurIPS 2020 • Carles Domingo-Enrich, Samy Jelassi, Arthur Mensch, Grant Rotskoff, Joan Bruna
Our method identifies mixed equilibria in high dimensions and is demonstrably effective for training mixtures of GANs.
1 code implementation • NeurIPS 2019 • Othmane Sebbouh, Nidham Gazagnadou, Samy Jelassi, Francis Bach, Robert M. Gower
Among the very first variance reduced stochastic methods for solving the empirical risk minimization problem was the SVRG method (Johnson & Zhang 2013).
1 code implementation • 29 May 2019 • Carles Domingo Enrich, Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna
Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.
no code implementations • 5 Feb 2019 • Grant Rotskoff, Samy Jelassi, Joan Bruna, Eric Vanden-Eijnden
Neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for the favorable training properties of "overparameterized" models.
no code implementations • NeurIPS 2018 • Thomas Pumir, Samy Jelassi, Nicolas Boumal
In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix Y of size $n$ by $k$ such that $X = YY^*$ is the SDP variable.