no code implementations • 29 Apr 2024 • Song Mei
U-Nets are among the most widely used architectures in computer vision, renowned for their exceptional performance in applications such as image segmentation, denoising, and diffusion modeling.
no code implementations • 11 Apr 2024 • Minshuo Chen, Song Mei, Jianqing Fan, Mengdi Wang
In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls.
no code implementations • 8 Apr 2024 • Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei
LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks.
no code implementations • 14 Nov 2023 • Michael Celentano, Zhou Fan, Licong Lin, Song Mei
In settings where it is conjectured that no efficient algorithm can find this local neighborhood, we prove analogous geometric properties for a local minimizer of the TAP free energy reachable by AMP, and show that posterior inference based on this minimizer remains correctly calibrated.
no code implementations • 16 Oct 2023 • Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai
Through extensive probing and a new pasting experiment, we further reveal several mechanisms within the trained transformers, such as concrete copying behaviors on both the inputs and the representations, linear ICL capability of the upper layers alone, and a post-ICL representation selection mechanism in a harder mixture setting.
no code implementations • 12 Oct 2023 • Licong Lin, Yu Bai, Song Mei
This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.
no code implementations • 20 Sep 2023 • Song Mei, Yuchen Wu
We investigate the approximation efficiency of score functions by deep neural networks in diffusion-based generative modeling.
no code implementations • 2 Feb 2023 • Fan Chen, Huan Wang, Caiming Xiong, Song Mei, Yu Bai
However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds.
1 code implementation • 4 Nov 2022 • Taejoo Ahn, Licong Lin, Song Mei
In this paper, we develop near-optimal multiple testing procedures for high dimensional Bayesian linear models with isotropic covariates.
no code implementations • 29 Sep 2022 • Fan Chen, Yu Bai, Song Mei
Recent work has identified several tractable subclasses that are learnable with polynomial samples, such as Partially Observable Markov Decision Processes (POMDPs) with certain revealing or decodability conditions.
no code implementations • 23 Sep 2022 • Fan Chen, Song Mei, Yu Bai
We make progress on this question by developing a unified algorithm framework for a large class of learning goals, building on the Decision-Estimation Coefficient (DEC) framework.
no code implementations • 30 May 2022 • Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu
A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs).
no code implementations • 15 May 2022 • Ziang Song, Song Mei, Yu Bai
We then design an uncoupled no-regret algorithm that finds an $\varepsilon$-approximate $K$-EFCE within $\widetilde{\mathcal{O}}(\max_{i}X_iA_i^{K}/\varepsilon^2)$ iterations in the full feedback setting, where $X_i$ and $A_i$ are the number of information sets and actions for the $i$-th player.
1 code implementation • ICLR 2022 • Yu Bai, Song Mei, Huan Wang, Yingbo Zhou, Caiming Xiong
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly over existing approaches in several applications such as prediction intervals with improved length, minimum-volume prediction sets for multi-output regression, and label prediction sets for image classification.
no code implementations • 3 Feb 2022 • Yu Bai, Chi Jin, Song Mei, Tiancheng Yu
This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors.
no code implementations • 16 Nov 2021 • Theodor Misiakiewicz, Song Mei
Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks.
no code implementations • ICLR 2022 • Nikhil Ghosh, Song Mei, Bin Yu
To understand how deep learning works, it is crucial to understand the training dynamics of neural networks.
no code implementations • ICLR 2022 • Ziang Song, Song Mei, Yu Bai
First, we design algorithms for learning an $\epsilon$-Coarse Correlated Equilibrium (CCE) in $\widetilde{\mathcal{O}}(H^5S\max_{i\le m} A_i / \epsilon^2)$ episodes, and an $\epsilon$-Correlated Equilibrium (CE) in $\widetilde{\mathcal{O}}(H^6S\max_{i\le m} A_i^2 / \epsilon^2)$ episodes.
no code implementations • 29 Sep 2021 • Lin Chen, Song Mei
Moreover, we theoretically show that the ridge estimator with optimal regularization can result in a monotone generalization risk curve and thereby eliminate multiple descent under some assumptions.
no code implementations • 21 Jun 2021 • Michael Celentano, Zhou Fan, Song Mei
This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy.
no code implementations • NeurIPS 2021 • Yu Bai, Song Mei, Huan Wang, Caiming Xiong
Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input.
no code implementations • 8 Mar 2021 • Zitong Yang, Yu Bai, Song Mei
We show that, in the setting where the classical uniform convergence bound is vacuous (diverges to $\infty$), uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions.
no code implementations • 25 Feb 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari
Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties.
no code implementations • 15 Feb 2021 • Yu Bai, Song Mei, Huan Wang, Caiming Xiong
Modern machine learning models with high accuracy are often miscalibrated -- the predicted top probability does not reflect the actual accuracy, and tends to be over-confident.
no code implementations • 26 Jan 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari
We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as $N\le n^{1-\delta}$ for some $\delta>0$.
1 code implementation • NeurIPS 2020 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance.
1 code implementation • NeurIPS 2019 • Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari
We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels.
no code implementations • 14 Aug 2019 • Song Mei, Andrea Montanari
We compute the precise asymptotics of the test error, in the limit $N, n, d\to \infty$ with $N/d$ and $n/d$ fixed.
1 code implementation • 21 Jun 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels.
no code implementations • 27 Apr 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.
no code implementations • 1 Mar 2019 • Yu Bai, John Duchi, Song Mei
We study a family of (potentially non-convex) constrained optimization problems with convex composite structure.
no code implementations • 16 Feb 2019 • Song Mei, Theodor Misiakiewicz, Andrea Montanari
Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$.
no code implementations • 18 Apr 2018 • Song Mei, Andrea Montanari, Phan-Minh Nguyen
Does SGD converge to a global optimum of the risk or only to a local optimum?
no code implementations • 15 Nov 2017 • Gerard Ben Arous, Song Mei, Andrea Montanari, Mihai Nica
We compute the expected number of critical points and local maxima of this objective function and show that it is exponential in the dimensions $n$, and give exact formulas for the exponential growth rate.
no code implementations • 25 Mar 2017 • Song Mei, Theodor Misiakiewicz, Andrea Montanari, Roberto I. Oliveira
In this paper we study the rank-constrained version of SDPs arising in MaxCut and in synchronization problems.
no code implementations • 22 Jul 2016 • Song Mei, Yu Bai, Andrea Montanari
We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors).