Search Results for author: Preetum Nakkiran

Found 32 papers, 11 papers with code

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

no code implementations • 7 Dec 2023 • Vimal Thilak, Chen Huang, Omid Saremi, Laurent Dinh, Hanlin Goh, Preetum Nakkiran, Joshua M. Susskind, Etai Littwin

In this paper, we introduce LiDAR (Linear Discriminant Analysis Rank), a metric designed to measure the quality of representations within JE architectures.

Paper
Add Code

Perspectives on the State and Future of Deep Learning -- 2023

no code implementations • 7 Dec 2023 • Micah Goldblum, Anima Anandkumar, Richard Baraniuk, Tom Goldstein, Kyunghyun Cho, Zachary C Lipton, Melanie Mitchell, Preetum Nakkiran, Max Welling, Andrew Gordon Wilson

The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time.

Benchmarking

Paper
Add Code

Vanishing Gradients in Reinforcement Finetuning of Language Models

1 code implementation • 31 Oct 2023 • Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin

Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms.

Paper
Code

What Algorithms can Transformers Learn? A Study in Length Generalization

no code implementations • 24 Oct 2023 • Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.

Paper
Add Code

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

1 code implementation • 21 Sep 2023 • Jarosław Błasiok, Preetum Nakkiran

We show that a simple modification fixes both constructions: first smooth the observations using an RBF kernel, then compute the Expected Calibration Error (ECE) of this smoothed function.

Paper
Code

When Does Optimizing a Proper Loss Yield Calibration?

no code implementations • NeurIPS 2023 • Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran

Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated.

Paper
Add Code

Loss Minimization Yields Multicalibration for Large Neural Networks

no code implementations • 19 Apr 2023 • Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Adam Tauman Kalai, Preetum Nakkiran

We show that minimizing the squared loss over all neural nets of size $n$ implies multicalibration for all but a bounded number of unlucky values of $n$.

Fairness

Paper
Add Code

A Unifying Theory of Distance from Calibration

no code implementations • 30 Nov 2022 • Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran

We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors.

Paper
Add Code

APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations

no code implementations • 8 Oct 2022 • Elan Rosenfeld, Preetum Nakkiran, Hadi Pouransari, Oncel Tuzel, Fartash Faghri

Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets.

Zero-Shot Learning

Paper
Add Code

The Calibration Generalization Gap

1 code implementation • 5 Oct 2022 • A. Michael Carrell, Neil Mallinar, James Lucas, Preetum Nakkiran

We propose a systematic way to study the calibration error: by decomposing it into (1) calibration error on the train set, and (2) the calibration generalization gap.

Data Augmentation

Paper
Code

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

no code implementations • 14 Jul 2022 • Neil Mallinar, James B. Simon, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran

In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime.

Learning Theory

Paper
Add Code

Limitations of the NTK for Understanding Generalization in Deep Learning

no code implementations • 20 Jun 2022 • Nikhil Vyas, Yamini Bansal, Preetum Nakkiran

The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical variants have been proposed as a proxy to capture certain behaviors of real neural networks.

Paper
Add Code

What You See is What You Get: Principled Deep Learning via Distributional Generalization

1 code implementation • 7 Apr 2022 • Bogdan Kulynych, Yao-Yuan Yang, Yaodong Yu, Jarosław Błasiok, Preetum Nakkiran

In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization.

Paper
Code

Knowledge Distillation: Bad Models Can Be Good Role Models

no code implementations • 28 Mar 2022 • Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz

We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data.

Knowledge Distillation Learning Theory

Paper
Add Code

Deconstructing Distributions: A Pointwise Framework of Learning

1 code implementation • 20 Feb 2022 • Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran

In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$.

Paper
Code

Limitations of Neural Collapse for Understanding Generalization in Deep Learning

no code implementations • 17 Feb 2022 • Like Hui, Mikhail Belkin, Preetum Nakkiran

We refine the Neural Collapse conjecture into two separate conjectures: collapse on the train set (an optimization property) and collapse on the test distribution (a generalization property).

Representation Learning

Paper
Add Code

Turing-Universal Learners with Optimal Scaling Laws

no code implementations • 9 Nov 2021 • Preetum Nakkiran

For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples.

Paper
Add Code

Distributional Generalization: Structure Beyond Test Error

no code implementations • 29 Sep 2021 • Preetum Nakkiran, Yamini Bansal

Classifiers in machine learning are often reduced to single dimensional quantities, such as test error or loss.

Paper
Add Code

Revisiting Model Stitching to Compare Neural Representations

no code implementations • NeurIPS 2021 • Yamini Bansal, Preetum Nakkiran, Boaz Barak

We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks.

Self-Supervised Learning

Paper
Add Code

Distributional Generalization: Characterizing Classifiers Beyond Test Error

no code implementations • NeurIPS 2021 • Preetum Nakkiran, Yamini Bansal

We present a new set of empirical properties of interpolating classifiers, including neural networks, kernel machines and decision trees.

Paper
Add Code

The Bootstrap Framework: Generalization Through the Lens of Online Optimization

no code implementations • ICLR 2021 • Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi

We propose a new framework for reasoning about generalization in deep learning.

Image Classification

Paper
Add Code

The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers

2 code implementations • 16 Oct 2020 • Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi

We propose a new framework for reasoning about generalization in deep learning.

Image Classification

Paper
Code

Distributional Generalization: A New Kind of Generalization

1 code implementation • 17 Sep 2020 • Preetum Nakkiran, Yamini Bansal

We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to close in just their average error.

2D Object Detection

5,782

Paper
Code

Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems

no code implementations • 15 May 2020 • Preetum Nakkiran

Learning rate schedule can significantly affect generalization performance in modern neural networks, but the reasons for this are not yet understood.

regression

Paper
Add Code

Optimal Regularization Can Mitigate Double Descent

no code implementations • ICLR 2021 • Preetum Nakkiran, Prayaag Venkat, Sham Kakade, Tengyu Ma

Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size.

regression

Paper
Add Code

More Data Can Hurt for Linear Regression: Sample-wise Double Descent

1 code implementation • 16 Dec 2019 • Preetum Nakkiran

In this expository note we describe a surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples: there is a regime where the test risk of the estimator found by gradient descent increases with additional samples.

regression

Paper
Code

Deep Double Descent: Where Bigger Models and More Data Hurt

3 code implementations • ICLR 2020 • Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.

Paper
Code

SGD on Neural Networks Learns Functions of Increasing Complexity

1 code implementation • NeurIPS 2019 • Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L. Edelman, Fred Zhang, Boaz Barak

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks.

Paper
Code

Computational Limitations in Robust Classification and Win-Win Results

no code implementations • 4 Feb 2019 • Akshay Degwekar, Preetum Nakkiran, Vinod Vaikuntanathan

We continue the study of statistical/computational tradeoffs in learning robust classifiers, following the recent work of Bubeck, Lee, Price and Razenshteyn who showed examples of classification tasks where (a) an efficient robust classifier exists, in the small-perturbation regime; (b) a non-robust classifier can be learned efficiently; but (c) it is computationally hard to learn a robust classifier, assuming the hardness of factoring large numbers.

Classification General Classification +1

Paper
Add Code

Adversarial Robustness May Be at Odds With Simplicity

no code implementations • 2 Jan 2019 • Preetum Nakkiran

}$ In this note, we show that this hypothesis is indeed possible, by giving several theoretical examples of classification tasks and sets of "simple" classifiers for which: (1) There exists a simple classifier with high standard accuracy, and also high accuracy under random $\ell_\infty$ noise.

Adversarial Robustness Classification +2

Paper
Add Code

The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science

no code implementations • 14 Sep 2018 • Preetum Nakkiran, Jarosław Błasiok

In this work, we propose a new framework for adaptive science which exponentially improves on this number of queries under a restricted yet scientifically relevant setting, where the goal of the scientist is to find a single (or a few) true hypotheses about the universe based on the samples.

Holdout Set

Paper
Add Code

Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

1 code implementation • 19 Sep 2017 • Charalampos E. Tsourakakis, Michael Mitzenmacher, Kasper Green Larsen, Jarosław Błasiok, Ben Lawson, Preetum Nakkiran, Vasileios Nakos

The {\em edge sign prediction problem} aims to predict whether an interaction between a pair of nodes will be positive or negative.

Clustering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.