no code implementations • 23 May 2023 • Achraf Bahamou, Donald Goldfarb
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization methods for minimizing empirical loss functions in deep learning, eliminating the need for the user to tune the learning rate (LR).
no code implementations • 8 Feb 2022 • Achraf Bahamou, Donald Goldfarb, Yi Ren
Specifically, our method uses a block-diagonal approximation to the empirical Fisher matrix, where for each layer in the DNN, whether it is convolutional or feed-forward and fully connected, the associated diagonal block is itself block-diagonal and is composed of a large number of mini-blocks of modest size.
1 code implementation • NeurIPS 2021 • Yi Ren, Donald Goldfarb
Based on the so-called tensor normal (TN) distribution, we propose and analyze a brand new approximate natural gradient method, Tensor Normal Training (TNT), which like Shampoo, only requires knowledge of the shape of the training parameters.
no code implementations • 12 Feb 2021 • Yi Ren, Achraf Bahamou, Donald Goldfarb
Several improvements to the methods in Goldfarb et al. (2020) are also proposed that can be applied to both MLPs and CNNs.
1 code implementation • NeurIPS 2020 • Donald Goldfarb, Yi Ren, Achraf Bahamou
We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs).
no code implementations • 31 Dec 2019 • Achraf Bahamou, Donald Goldfarb
We also propose an adaptive version of ADAM that eliminates the need to tune the base learning rate and compares favorably to fine-tuned ADAM on training DNNs.
no code implementations • 5 Jun 2019 • Yi Ren, Donald Goldfarb
We present practical Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods for solving non-convex optimization problems that arise in training deep neural networks involving enormous numbers of variables and huge data sets.
1 code implementation • NeurIPS 2019 • Yunfei Teng, Wenbo Gao, Francois Chalus, Anna Choromanska, Donald Goldfarb, Adrian Weller
Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers).
no code implementations • 26 Mar 2019 • Yuan Gao, Christian Kroer, Donald Goldfarb
In particular, the increasing averages consistently outperform the uniform averages in all test problems by orders of magnitude.
no code implementations • ICML 2017 • Chaoxu Zhou, Wenbo Gao, Donald Goldfarb
We propose a novel class of stochastic, adaptive methods for minimizing self-concordant functions which can be expressed as an expected value.
no code implementations • 5 Jul 2016 • Xiao Wang, Shiqian Ma, Donald Goldfarb, Wei Liu
In this paper we study stochastic quasi-Newton methods for nonconvex stochastic optimization, where we assume that noisy information about the gradients of the objective function is available via a stochastic first-order oracle (SFO).
no code implementations • 29 Mar 2014 • Cun Mu, Yuqian Zhang, John Wright, Donald Goldfarb
Recovering matrices from compressive and grossly corrupted observations is a fundamental problem in robust statistics, with rich applications in computer vision and machine learning.
no code implementations • 24 Nov 2013 • Donald Goldfarb, Zhiwei Qin
Robust tensor recovery plays an instrumental role in robustifying tensor decompositions for multilinear data analysis against outliers, gross corruptions and missing values and has a diverse array of applications.
no code implementations • 26 Sep 2013 • Necdet Serhat Aybat, Donald Goldfarb, Shiqian Ma
Moreover, if the observed data matrix has also been corrupted by a dense noise matrix in addition to gross sparse error, then the stable principal component pursuit (SPCP) problem is solved to recover the low-rank matrix.
Optimization and Control
no code implementations • 22 Jul 2013 • Cun Mu, Bo Huang, John Wright, Donald Goldfarb
The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor.
no code implementations • 11 May 2011 • Necdet Serhat Aybat, Donald Goldfarb, Garud Iyengar
The stable principal component pursuit (SPCP) problem is a non-smooth convex optimization problem, the solution of which has been shown both in theory and in practice to enable one to recover the low rank and sparse components of a matrix whose elements have been corrupted by Gaussian noise.
Optimization and Control
no code implementations • NeurIPS 2010 • Katya Scheinberg, Shiqian Ma, Donald Goldfarb
Gaussian graphical models are of great interest in statistical learning.
no code implementations • 23 Dec 2009 • Donald Goldfarb, Shiqian Ma, Katya Scheinberg
We present in this paper first-order alternating linearization algorithms based on an alternating direction augmented Lagrangian approach for minimizing the sum of two convex functions.
1 code implementation • 11 May 2009 • Shiqian Ma, Donald Goldfarb, Lifeng Chen
The tightest convex relaxation of this problem is the linearly constrained nuclear norm minimization.
Optimization and Control Information Theory Information Theory