1 code implementation • 24 May 2024 • Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky
Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems.
no code implementations • 6 Mar 2024 • Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower
We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants.
1 code implementation • 11 Oct 2023 • Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko
To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task.
1 code implementation • 9 Jun 2023 • Konstantin Mishchenko, Aaron Defazio
We consider the problem of estimating the learning rate in adaptive methods, such as AdaGrad and Adam.
1 code implementation • 12 May 2023 • Fabian Schaipp, Ruben Ohana, Michael Eickenberg, Aaron Defazio, Robert M. Gower
MoMo uses momentum estimates of the losses and gradients sampled at each iteration to build a model of the loss function.
1 code implementation • 18 Jan 2023 • Aaron Defazio, Konstantin Mishchenko
D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step.
no code implementations • 14 Jun 2022 • Aaron Defazio, Baoyu Zhou, Lin Xiao
The classical AdaGrad method adapts the learning rate by dividing by the square root of a sum of squared gradients.
no code implementations • 22 Jun 2021 • Robert M. Gower, Aaron Defazio, Michael Rabbat
MOTAPS can be seen as a variant of the Stochastic Polyak (SP) which is also a method that also uses loss values to adjust the stepsize.
5 code implementations • 26 Jan 2021 • Aaron Defazio, Samy Jelassi
We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods.
no code implementations • 20 Oct 2020 • Samy Jelassi, Aaron Defazio
First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks.
1 code implementation • 1 Oct 2020 • Aaron Defazio
Momentum methods are now used pervasively within the machine learning community for training non-convex models such as deep neural networks.
no code implementations • 14 Jun 2020 • Othmane Sebbouh, Robert M. Gower, Aaron Defazio
We show that these results still hold when using stochastic line search and stochastic Polyak stepsizes, thereby giving the first proof of convergence of these methods in the non-overparametrized regime.
no code implementations • 1 Jun 2020 • Aaron Defazio, Robert M. Gower
The convergence rates for convex and non-convex optimization methods depend on the choice of a host of constants, including step sizes, Lyapunov function constants and momentum constants.
3 code implementations • 14 Apr 2020 • Anuroop Sriram, Jure Zbontar, Tullie Murrell, Aaron Defazio, C. Lawrence Zitnick, Nafissa Yakubova, Florian Knoll, Patricia Johnson
The slow acquisition speed of magnetic resonance imaging (MRI) has led to the development of two complementary methods: acquiring multiple views of the anatomy simultaneously (parallel imaging) and acquiring fewer samples than necessary for traditional signal processing methods (compressed sensing).
Ranked #1 on MRI Reconstruction on fastMRI Knee 4x
1 code implementation • NeurIPS 2020 • Aaron Defazio, Tullie Murrell, Michael P. Recht
MRI images reconstructed from sub-sampled Cartesian data using deep learning techniques often show a characteristic banding (sometimes described as streaking), which is particularly strong in low signal-to-noise regions of the reconstructed image.
1 code implementation • 6 Jan 2020 • Florian Knoll, Tullie Murrell, Anuroop Sriram, Nafissa Yakubova, Jure Zbontar, Michael Rabbat, Aaron Defazio, Matthew J. Muckley, Daniel K. Sodickson, C. Lawrence Zitnick, Michael P. Recht
Conclusion: The challenge led to new developments in machine learning for image reconstruction, provided insight into the current state of the art in the field, and highlighted remaining hurdles for clinical adoption.
no code implementations • ICLR 2020 • Aaron Defazio, Leon Bottou
Abstract In this work, we describe a set of rules for the design and initialization of well-conditioned neural networks, guided by the goal of naturally balancing the diagonal blocks of the Hessian at the start of training.
2 code implementations • 2 Dec 2019 • Aaron Defazio
Deep learning approaches to accelerated MRI take a matrix of sampled Fourier-space lines as input and produce a spatial image as output.
1 code implementation • CVPR 2020 • Anuroop Sriram, Jure Zbontar, Tullie Murrell, C. Lawrence Zitnick, Aaron Defazio, Daniel K. Sodickson
In this paper, we present a novel method to integrate traditional parallel imaging methods into deep neural networks that is able to generate high quality reconstructions even for high acceleration factors.
no code implementations • 10 Jun 2019 • Aaron Defazio, Léon Bottou
We propose a system for calculating a "scaling constant" for layers and weights of neural networks.
no code implementations • ICLR 2019 • Aaron Defazio
We introduce a new normalization technique that exhibits the fast convergence properties of batch normalization using a transformation of layer weights instead of layer outputs.
no code implementations • NeurIPS 2019 • Aaron Defazio
In this work we propose a differential geometric motivation for Nesterov's accelerated gradient method (AGM) for strongly-convex problems.
1 code implementation • ICLR 2019 • Aaron Defazio, Léon Bottou
The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem.
no code implementations • ICLR 2019 • Aaron Defazio, Léon Bottou
We introduce a new normalization technique that exhibits the fast convergence properties of batch normalization using a transformation of layer weights instead of layer outputs.
11 code implementations • 21 Nov 2018 • Jure Zbontar, Florian Knoll, Anuroop Sriram, Tullie Murrell, Zhengnan Huang, Matthew J. Muckley, Aaron Defazio, Ruben Stern, Patricia Johnson, Mary Bruno, Marc Parente, Krzysztof J. Geras, Joe Katsnelson, Hersh Chandarana, Zizhao Zhang, Michal Drozdzal, Adriana Romero, Michael Rabbat, Pascal Vincent, Nafissa Yakubova, James Pinkerton, Duo Wang, Erich Owens, C. Lawrence Zitnick, Michael P. Recht, Daniel K. Sodickson, Yvonne W. Lui
Accelerating Magnetic Resonance Imaging (MRI) by taking fewer measurements has the potential to reduce medical costs, minimize stress to patients and make MRI possible in applications where it is currently prohibitively slow or expensive.
1 code implementation • NeurIPS 2016 • Aaron Defazio
We describe a novel optimization method for finite sums (such as empirical risk minimization problems) building on the recently introduced SAGA method.
no code implementations • 9 Oct 2015 • Aaron Defazio
For problems where the structure is known but the parameters unknown, we introduce an approximate maximum likelihood learning algorithm that is capable of learning a useful subclass of Gaussian graphical models.
no code implementations • 16 Apr 2015 • Mark Schmidt, Reza Babanezhad, Mohamed Osama Ahmed, Aaron Defazio, Ann Clifton, Anoop Sarkar
We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs).
no code implementations • 31 Oct 2014 • Aaron Defazio, Thore Graepel
Reinforcement learning agents have traditionally been evaluated on small toy problems.
5 code implementations • NeurIPS 2014 • Aaron Defazio, Francis Bach, Simon Lacoste-Julien
In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates.
no code implementations • NeurIPS 2012 • Aaron Defazio, Tibério S. Caetano
We consider the case where the structure of the graph to be reconstructed is known to be scale-free.