Search Results for author: Song Mei

Found 36 papers, 5 papers with code

U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models

no code implementations • 29 Apr 2024 • Song Mei

U-Nets are among the most widely used architectures in computer vision, renowned for their exceptional performance in applications such as image segmentation, denoising, and diffusion modeling.

Decoder Denoising +2

Paper
Add Code

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization

no code implementations • 11 Apr 2024 • Minshuo Chen, Song Mei, Jianqing Fan, Mengdi Wang

In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls.

Paper
Add Code

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

no code implementations • 8 Apr 2024 • Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei

LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks.

Paper
Add Code

Mean-field variational inference with the TAP free energy: Geometric and statistical properties in linear models

no code implementations • 14 Nov 2023 • Michael Celentano, Zhou Fan, Licong Lin, Song Mei

In settings where it is conjectured that no efficient algorithm can find this local neighborhood, we prove analogous geometric properties for a local minimizer of the TAP free energy reachable by AMP, and show that posterior inference based on this minimizer remains correctly calibrated.

Variational Inference

Paper
Add Code

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

no code implementations • 16 Oct 2023 • Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai

Through extensive probing and a new pasting experiment, we further reveal several mechanisms within the trained transformers, such as concrete copying behaviors on both the inputs and the representations, linear ICL capability of the upper layers alone, and a post-ICL representation selection mechanism in a harder mixture setting.

In-Context Learning

Paper
Add Code

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

no code implementations • 12 Oct 2023 • Licong Lin, Yu Bai, Song Mei

This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.

reinforcement-learning Thompson Sampling

Paper
Add Code

Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models

no code implementations • 20 Sep 2023 • Song Mei, Yuchen Wu

We investigate the approximation efficiency of score functions by deep neural networks in diffusion-based generative modeling.

Denoising Efficient Neural Network +1

Paper
Add Code

Lower Bounds for Learning in Revealing POMDPs

no code implementations • 2 Feb 2023 • Fan Chen, Huan Wang, Caiming Xiong, Song Mei, Yu Bai

However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds.

Reinforcement Learning (RL)

Paper
Add Code

Near-optimal multiple testing in Bayesian linear models with finite-sample FDR control

1 code implementation • 4 Nov 2022 • Taejoo Ahn, Licong Lin, Song Mei

In this paper, we develop near-optimal multiple testing procedures for high dimensional Bayesian linear models with isotropic covariates.

Open-Ended Question Answering Variable Selection

Paper
Code

Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms

no code implementations • 29 Sep 2022 • Fan Chen, Yu Bai, Song Mei

Recent work has identified several tractable subclasses that are learnable with polynomial samples, such as Partially Observable Markov Decision Processes (POMDPs) with certain revealing or decodability conditions.

Reinforcement Learning (RL)

Paper
Add Code

Unified Algorithms for RL with Decision-Estimation Coefficients: PAC, Reward-Free, Preference-Based Learning, and Beyond

no code implementations • 23 Sep 2022 • Fan Chen, Song Mei, Yu Bai

We make progress on this question by developing a unified algorithm framework for a large class of learning goals, building on the Decision-Estimation Coefficient (DEC) framework.

PAC learning Reinforcement Learning (RL)

Paper
Add Code

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

no code implementations • 30 May 2022 • Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu

A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs).

Paper
Add Code

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

no code implementations • 15 May 2022 • Ziang Song, Song Mei, Yu Bai

We then design an uncoupled no-regret algorithm that finds an $\varepsilon$-approximate $K$-EFCE within $\widetilde{\mathcal{O}}(\max_{i}X_iA_i^{K}/\varepsilon^2)$ iterations in the full feedback setting, where $X_i$ and $A_i$ are the number of information sets and actions for the $i$-th player.

Paper
Add Code

Efficient and Differentiable Conformal Prediction with General Function Classes

1 code implementation • ICLR 2022 • Yu Bai, Song Mei, Huan Wang, Yingbo Zhou, Caiming Xiong

Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly over existing approaches in several applications such as prediction intervals with improved length, minimum-volume prediction sets for multi-output regression, and label prediction sets for image classification.

Conformal Prediction Image Classification +2

Paper
Code

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

no code implementations • 3 Feb 2022 • Yu Bai, Chi Jin, Song Mei, Tiancheng Yu

This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors.

counterfactual Open-Ended Question Answering

Paper
Add Code

Learning with convolution and pooling operations in kernel methods

no code implementations • 16 Nov 2021 • Theodor Misiakiewicz, Song Mei

Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks.

Image Classification

Paper
Add Code

The Three Stages of Learning Dynamics in High-Dimensional Kernel Methods

no code implementations • ICLR 2022 • Nikhil Ghosh, Song Mei, Bin Yu

To understand how deep learning works, it is crucial to understand the training dynamics of neural networks.

Vocal Bursts Intensity Prediction

Paper
Add Code

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

no code implementations • ICLR 2022 • Ziang Song, Song Mei, Yu Bai

First, we design algorithms for learning an $\epsilon$-Coarse Correlated Equilibrium (CCE) in $\widetilde{\mathcal{O}}(H^5S\max_{i\le m} A_i / \epsilon^2)$ episodes, and an $\epsilon$-Correlated Equilibrium (CE) in $\widetilde{\mathcal{O}}(H^6S\max_{i\le m} A_i^2 / \epsilon^2)$ episodes.

Multi-agent Reinforcement Learning

Paper
Add Code

Spectral Multiplicity Entails Sample-wise Multiple Descent

no code implementations • 29 Sep 2021 • Lin Chen, Song Mei

Moreover, we theoretically show that the ridge estimator with optimal regularization can result in a monotone generalization risk curve and thereby eliminate multiple descent under some assumptions.

Paper
Add Code

Local convexity of the TAP free energy and AMP convergence for Z2-synchronization

no code implementations • 21 Jun 2021 • Michael Celentano, Zhou Fan, Song Mei

This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy.

Bayesian Inference Variational Inference

Paper
Add Code

Understanding the Under-Coverage Bias in Uncertainty Estimation

no code implementations • NeurIPS 2021 • Yu Bai, Song Mei, Huan Wang, Caiming Xiong

Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input.

regression

Paper
Add Code

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models

no code implementations • 8 Mar 2021 • Zitong Yang, Yu Bai, Song Mei

We show that, in the setting where the classical uniform convergence bound is vacuous (diverges to $\infty$), uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions.

Paper
Add Code

Learning with invariances in random features and kernel models

no code implementations • 25 Feb 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari

Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties.

Data Augmentation

Paper
Add Code

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification

no code implementations • 15 Feb 2021 • Yu Bai, Song Mei, Huan Wang, Caiming Xiong

Modern machine learning models with high accuracy are often miscalibrated -- the predicted top probability does not reflect the actual accuracy, and tends to be over-confident.

Binary Classification

Paper
Add Code

Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration

no code implementations • 26 Jan 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari

We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as $N\le n^{1-\delta}$ for some $\delta>0$.

regression

Paper
Add Code

When Do Neural Networks Outperform Kernel Methods?

1 code implementation • NeurIPS 2020 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance.

Image Classification

Paper
Code

Limitations of Lazy Training of Two-layers Neural Network

1 code implementation • NeurIPS 2019 • Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels.

Vocal Bursts Valence Prediction

Paper
Code

The generalization error of random features regression: Precise asymptotics and double descent curve

no code implementations • 14 Aug 2019 • Song Mei, Andrea Montanari

We compute the precise asymptotics of the test error, in the limit $N, n, d\to \infty$ with $N/d$ and $n/d$ fixed.

regression

Paper
Add Code

Limitations of Lazy Training of Two-layers Neural Networks

1 code implementation • 21 Jun 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels.

Vocal Bursts Valence Prediction

Paper
Code

Linearized two-layers neural networks in high dimension

no code implementations • 27 Apr 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.

regression Vocal Bursts Intensity Prediction +1

Paper
Add Code

Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs

no code implementations • 1 Mar 2019 • Yu Bai, John Duchi, Song Mei

We study a family of (potentially non-convex) constrained optimization problems with convex composite structure.

Paper
Add Code

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

no code implementations • 16 Feb 2019 • Song Mei, Theodor Misiakiewicz, Andrea Montanari

Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$.

Paper
Add Code

A Mean Field View of the Landscape of Two-Layers Neural Networks

no code implementations • 18 Apr 2018 • Song Mei, Andrea Montanari, Phan-Minh Nguyen

Does SGD converge to a global optimum of the risk or only to a local optimum?

Paper
Add Code

The landscape of the spiked tensor model

no code implementations • 15 Nov 2017 • Gerard Ben Arous, Song Mei, Andrea Montanari, Mihai Nica

We compute the expected number of critical points and local maxima of this objective function and show that it is exponential in the dimensions $n$, and give exact formulas for the exponential growth rate.

Paper
Add Code

Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality

no code implementations • 25 Mar 2017 • Song Mei, Theodor Misiakiewicz, Andrea Montanari, Roberto I. Oliveira

In this paper we study the rank-constrained version of SDPs arising in MaxCut and in synchronization problems.

Stochastic Block Model

Paper
Add Code

The Landscape of Empirical Risk for Non-convex Losses

no code implementations • 22 Jul 2016 • Song Mei, Yu Bai, Andrea Montanari

We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors).

Binary Classification General Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.