no code implementations • 7 Dec 2023 • Vimal Thilak, Chen Huang, Omid Saremi, Laurent Dinh, Hanlin Goh, Preetum Nakkiran, Joshua M. Susskind, Etai Littwin
In this paper, we introduce LiDAR (Linear Discriminant Analysis Rank), a metric designed to measure the quality of representations within JE architectures.
no code implementations • 7 Dec 2023 • Micah Goldblum, Anima Anandkumar, Richard Baraniuk, Tom Goldstein, Kyunghyun Cho, Zachary C Lipton, Melanie Mitchell, Preetum Nakkiran, Max Welling, Andrew Gordon Wilson
The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time.
1 code implementation • 31 Oct 2023 • Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin
Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms.
no code implementations • 24 Oct 2023 • Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran
Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.
1 code implementation • 21 Sep 2023 • Jarosław Błasiok, Preetum Nakkiran
We show that a simple modification fixes both constructions: first smooth the observations using an RBF kernel, then compute the Expected Calibration Error (ECE) of this smoothed function.
no code implementations • NeurIPS 2023 • Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran
Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated.
no code implementations • 19 Apr 2023 • Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Adam Tauman Kalai, Preetum Nakkiran
We show that minimizing the squared loss over all neural nets of size $n$ implies multicalibration for all but a bounded number of unlucky values of $n$.
no code implementations • 30 Nov 2022 • Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran
We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors.
no code implementations • 8 Oct 2022 • Elan Rosenfeld, Preetum Nakkiran, Hadi Pouransari, Oncel Tuzel, Fartash Faghri
Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets.
1 code implementation • 5 Oct 2022 • A. Michael Carrell, Neil Mallinar, James Lucas, Preetum Nakkiran
We propose a systematic way to study the calibration error: by decomposing it into (1) calibration error on the train set, and (2) the calibration generalization gap.
no code implementations • 14 Jul 2022 • Neil Mallinar, James B. Simon, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran
In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime.
no code implementations • 20 Jun 2022 • Nikhil Vyas, Yamini Bansal, Preetum Nakkiran
The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical variants have been proposed as a proxy to capture certain behaviors of real neural networks.
1 code implementation • 7 Apr 2022 • Bogdan Kulynych, Yao-Yuan Yang, Yaodong Yu, Jarosław Błasiok, Preetum Nakkiran
In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization.
no code implementations • 28 Mar 2022 • Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz
We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data.
1 code implementation • 20 Feb 2022 • Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran
In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$.
no code implementations • 17 Feb 2022 • Like Hui, Mikhail Belkin, Preetum Nakkiran
We refine the Neural Collapse conjecture into two separate conjectures: collapse on the train set (an optimization property) and collapse on the test distribution (a generalization property).
no code implementations • 9 Nov 2021 • Preetum Nakkiran
For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples.
no code implementations • 29 Sep 2021 • Preetum Nakkiran, Yamini Bansal
Classifiers in machine learning are often reduced to single dimensional quantities, such as test error or loss.
no code implementations • NeurIPS 2021 • Yamini Bansal, Preetum Nakkiran, Boaz Barak
We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks.
no code implementations • NeurIPS 2021 • Preetum Nakkiran, Yamini Bansal
We present a new set of empirical properties of interpolating classifiers, including neural networks, kernel machines and decision trees.
no code implementations • ICLR 2021 • Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
We propose a new framework for reasoning about generalization in deep learning.
2 code implementations • 16 Oct 2020 • Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
We propose a new framework for reasoning about generalization in deep learning.
1 code implementation • 17 Sep 2020 • Preetum Nakkiran, Yamini Bansal
We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to close in just their average error.
no code implementations • 15 May 2020 • Preetum Nakkiran
Learning rate schedule can significantly affect generalization performance in modern neural networks, but the reasons for this are not yet understood.
no code implementations • ICLR 2021 • Preetum Nakkiran, Prayaag Venkat, Sham Kakade, Tengyu Ma
Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size.
1 code implementation • 16 Dec 2019 • Preetum Nakkiran
In this expository note we describe a surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples: there is a regime where the test risk of the estimator found by gradient descent increases with additional samples.
3 code implementations • ICLR 2020 • Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever
We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.
1 code implementation • NeurIPS 2019 • Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L. Edelman, Fred Zhang, Boaz Barak
We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks.
no code implementations • 4 Feb 2019 • Akshay Degwekar, Preetum Nakkiran, Vinod Vaikuntanathan
We continue the study of statistical/computational tradeoffs in learning robust classifiers, following the recent work of Bubeck, Lee, Price and Razenshteyn who showed examples of classification tasks where (a) an efficient robust classifier exists, in the small-perturbation regime; (b) a non-robust classifier can be learned efficiently; but (c) it is computationally hard to learn a robust classifier, assuming the hardness of factoring large numbers.
no code implementations • 2 Jan 2019 • Preetum Nakkiran
}$ In this note, we show that this hypothesis is indeed possible, by giving several theoretical examples of classification tasks and sets of "simple" classifiers for which: (1) There exists a simple classifier with high standard accuracy, and also high accuracy under random $\ell_\infty$ noise.
no code implementations • 14 Sep 2018 • Preetum Nakkiran, Jarosław Błasiok
In this work, we propose a new framework for adaptive science which exponentially improves on this number of queries under a restricted yet scientifically relevant setting, where the goal of the scientist is to find a single (or a few) true hypotheses about the universe based on the samples.
1 code implementation • 19 Sep 2017 • Charalampos E. Tsourakakis, Michael Mitzenmacher, Kasper Green Larsen, Jarosław Błasiok, Ben Lawson, Preetum Nakkiran, Vasileios Nakos
The {\em edge sign prediction problem} aims to predict whether an interaction between a pair of nodes will be positive or negative.