1 code implementation • 12 Feb 2024 • Noam Razin, Yotam Alexander, Edo Cohen-Karlik, Raja Giryes, Amir Globerson, Nadav Cohen
This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states.
1 code implementation • 31 Oct 2023 • Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin
Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms.
no code implementations • 24 Oct 2023 • Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran
Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.
1 code implementation • 20 Mar 2023 • Yotam Alexander, Nimrod De La Vega, Noam Razin, Nadav Cohen
Focusing on locally connected neural networks (a prevalent family of architectures that includes convolutional and recurrent neural networks as well as local self-attention models), we address this problem by adopting theoretical tools from quantum physics.
1 code implementation • NeurIPS 2023 • Noam Razin, Tom Verbin, Nadav Cohen
Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i. e. between the sides of a given partition of input vertices.
1 code implementation • 27 Jan 2022 • Noam Razin, Asaf Maman, Nadav Cohen
In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks.
1 code implementation • 19 Feb 2021 • Noam Razin, Asaf Maman, Nadav Cohen
Recent efforts to unravel the mystery of implicit regularization in deep learning have led to a theoretical focus on matrix factorization -- matrix completion via linear neural network.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Itzik Malkiel, Oren Barkan, Avi Caciularu, Noam Razin, Ori Katz, Noam Koenigstein
In addition, we introduce a new language understanding task for wine recommendations using similarities based on professional wine reviews.
1 code implementation • NeurIPS 2020 • Noam Razin, Nadav Cohen
Mathematically characterizing the implicit regularization induced by gradient-based optimization is a longstanding pursuit in the theory of deep learning.
1 code implementation • 14 Aug 2019 • Oren Barkan, Noam Razin, Itzik Malkiel, Ori Katz, Avi Caciularu, Noam Koenigstein
In this paper, we introduce Distilled Sentence Embedding (DSE) - a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks.