Search Results for author: Aaron Defazio

Found 31 papers, 16 papers with code

The Road Less Scheduled

1 code implementation • 24 May 2024 • Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems.

Scheduling

1,466

Paper
Code

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

no code implementations • 6 Mar 2024 • Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower

We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants.

Paper
Add Code

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

1 code implementation • 11 Oct 2023 • Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko

To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task.

Scheduling

Paper
Code

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

1 code implementation • 9 Jun 2023 • Konstantin Mishchenko, Aaron Defazio

We consider the problem of estimating the learning rate in adaptive methods, such as AdaGrad and Adam.

270

Paper
Code

MoMo: Momentum Models for Adaptive Learning Rates

1 code implementation • 12 May 2023 • Fabian Schaipp, Ruben Ohana, Michael Eickenberg, Aaron Defazio, Robert M. Gower

MoMo uses momentum estimates of the losses and gradients sampled at each iteration to build a model of the loss function.

Recommendation Systems Stochastic Optimization

Paper
Code

Learning-Rate-Free Learning by D-Adaptation

1 code implementation • 18 Jan 2023 • Aaron Defazio, Konstantin Mishchenko

D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step.

487

Paper
Code

Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method

no code implementations • 14 Jun 2022 • Aaron Defazio, Baoyu Zhou, Lin Xiao

The classical AdaGrad method adapts the learning rate by dividing by the square root of a sum of squared gradients.

Paper
Add Code

Stochastic Polyak Stepsize with a Moving Target

no code implementations • 22 Jun 2021 • Robert M. Gower, Aaron Defazio, Michael Rabbat

MOTAPS can be seen as a variant of the Stochastic Polyak (SP) which is also a method that also uses loss values to adjust the stepsize.

Image Classification Translation

Paper
Add Code

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

5 code implementations • 26 Jan 2021 • Aaron Defazio, Samy Jelassi

We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods.

Stochastic Optimization

796

Paper
Code

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

no code implementations • 20 Oct 2020 • Samy Jelassi, Aaron Defazio

First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks.

Stochastic Optimization

Paper
Add Code

Momentum via Primal Averaging: Theoretical Insights and Learning Rate Schedules for Non-Convex Optimization

1 code implementation • 1 Oct 2020 • Aaron Defazio

Momentum methods are now used pervasively within the machine learning community for training non-convex models such as deep neural networks.

BIG-bench Machine Learning Single Particle Analysis

796

Paper
Code

Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball

no code implementations • 14 Jun 2020 • Othmane Sebbouh, Robert M. Gower, Aaron Defazio

We show that these results still hold when using stochastic line search and stochastic Polyak stepsizes, thereby giving the first proof of convergence of these methods in the non-overparametrized regime.

Paper
Add Code

The Power of Factorial Powers: New Parameter settings for (Stochastic) Optimization

no code implementations • 1 Jun 2020 • Aaron Defazio, Robert M. Gower

The convergence rates for convex and non-convex optimization methods depend on the choice of a host of constants, including step sizes, Lyapunov function constants and momentum constants.

Stochastic Optimization

Paper
Add Code

End-to-End Variational Networks for Accelerated MRI Reconstruction

3 code implementations • 14 Apr 2020 • Anuroop Sriram, Jure Zbontar, Tullie Murrell, Aaron Defazio, C. Lawrence Zitnick, Nafissa Yakubova, Florian Knoll, Patricia Johnson

The slow acquisition speed of magnetic resonance imaging (MRI) has led to the development of two complementary methods: acquiring multiple views of the anatomy simultaneously (parallel imaging) and acquiring fewer samples than necessary for traditional signal processing methods (compressed sensing).

Ranked #1 on MRI Reconstruction on fastMRI Knee 4x

Anatomy MRI Reconstruction

1,259

Paper
Code

MRI Banding Removal via Adversarial Training

1 code implementation • NeurIPS 2020 • Aaron Defazio, Tullie Murrell, Michael P. Recht

MRI images reconstructed from sub-sampled Cartesian data using deep learning techniques often show a characteristic banding (sometimes described as streaking), which is particularly strong in low signal-to-noise regions of the reconstructed image.

1,259

Paper
Code

Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge

1 code implementation • 6 Jan 2020 • Florian Knoll, Tullie Murrell, Anuroop Sriram, Nafissa Yakubova, Jure Zbontar, Michael Rabbat, Aaron Defazio, Matthew J. Muckley, Daniel K. Sodickson, C. Lawrence Zitnick, Michael P. Recht

Conclusion: The challenge led to new developments in machine learning for image reconstruction, provided insight into the current state of the art in the field, and highlighted remaining hurdles for clinical adoption.

BIG-bench Machine Learning Image Reconstruction

1,259

Paper
Code

Scaling Laws for the Principled Design, Initialization, and Preconditioning of ReLU Networks

no code implementations • ICLR 2020 • Aaron Defazio, Leon Bottou

Abstract In this work, we describe a set of rules for the design and initialization of well-conditioned neural networks, guided by the goal of naturally balancing the diagonal blocks of the Hessian at the start of training.

Paper
Add Code

Offset Sampling Improves Deep Learning based Accelerated MRI Reconstructions by Exploiting Symmetry

2 code implementations • 2 Dec 2019 • Aaron Defazio

Deep learning approaches to accelerated MRI take a matrix of sampled Fourier-space lines as input and produce a spatial image as output.

1,259

Paper
Code

GrappaNet: Combining Parallel Imaging with Deep Learning for Multi-Coil MRI Reconstruction

1 code implementation • CVPR 2020 • Anuroop Sriram, Jure Zbontar, Tullie Murrell, C. Lawrence Zitnick, Aaron Defazio, Daniel K. Sodickson

In this paper, we present a novel method to integrate traditional parallel imaging methods into deep neural networks that is able to generate high quality reconstructions even for high acceleration factors.

MRI Reconstruction

1,259

Paper
Code

Beyond Folklore: A Scaling Calculus for the Design and Initialization of ReLU Networks

no code implementations • 10 Jun 2019 • Aaron Defazio, Léon Bottou

We propose a system for calculating a "scaling constant" for layers and weights of neural networks.

Paper
Add Code

CONTROLLING COVARIATE SHIFT USING EQUILIBRIUM NORMALIZATION OF WEIGHTS

no code implementations • ICLR 2019 • Aaron Defazio

We introduce a new normalization technique that exhibits the fast convergence properties of batch normalization using a transformation of layer weights instead of layer outputs.

Paper
Add Code

On the Curved Geometry of Accelerated Optimization

no code implementations • NeurIPS 2019 • Aaron Defazio

In this work we propose a differential geometric motivation for Nesterov's accelerated gradient method (AGM) for strongly-convex problems.

Paper
Add Code

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

1 code implementation • ICLR 2019 • Aaron Defazio, Léon Bottou

The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem.

Paper
Code

Controlling Covariate Shift using Balanced Normalization of Weights

no code implementations • ICLR 2019 • Aaron Defazio, Léon Bottou

We introduce a new normalization technique that exhibits the fast convergence properties of batch normalization using a transformation of layer weights instead of layer outputs.

Paper
Add Code

fastMRI: An Open Dataset and Benchmarks for Accelerated MRI

11 code implementations • 21 Nov 2018 • Jure Zbontar, Florian Knoll, Anuroop Sriram, Tullie Murrell, Zhengnan Huang, Matthew J. Muckley, Aaron Defazio, Ruben Stern, Patricia Johnson, Mary Bruno, Marc Parente, Krzysztof J. Geras, Joe Katsnelson, Hersh Chandarana, Zizhao Zhang, Michal Drozdzal, Adriana Romero, Michael Rabbat, Pascal Vincent, Nafissa Yakubova, James Pinkerton, Duo Wang, Erich Owens, C. Lawrence Zitnick, Michael P. Recht, Daniel K. Sodickson, Yvonne W. Lui

Accelerating Magnetic Resonance Imaging (MRI) by taking fewer measurements has the potential to reduce medical costs, minimize stress to patients and make MRI possible in applications where it is currently prohibitively slow or expensive.

BIG-bench Machine Learning Image Reconstruction

1,259

Paper
Code

A Simple Practical Accelerated Method for Finite Sums

1 code implementation • NeurIPS 2016 • Aaron Defazio

We describe a novel optimization method for finite sums (such as empirical risk minimization problems) building on the recently introduced SAGA method.

Paper
Code

New Optimisation Methods for Machine Learning

no code implementations • 9 Oct 2015 • Aaron Defazio

For problems where the structure is known but the parameters unknown, we introduce an approximate maximum likelihood learning algorithm that is capable of learning a useful subclass of Gaussian graphical models.

BIG-bench Machine Learning Philosophy

Paper
Add Code

Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields

no code implementations • 16 Apr 2015 • Mark Schmidt, Reza Babanezhad, Mohamed Osama Ahmed, Aaron Defazio, Ann Clifton, Anoop Sarkar

We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs).

Paper
Add Code

A Comparison of learning algorithms on the Arcade Learning Environment

no code implementations • 31 Oct 2014 • Aaron Defazio, Thore Graepel

Reinforcement learning agents have traditionally been evaluated on small toy problems.

Atari Games reinforcement-learning +1

Paper
Add Code

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

5 code implementations • NeurIPS 2014 • Aaron Defazio, Francis Bach, Simon Lacoste-Julien

In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates.

58,583

Paper
Code

A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

no code implementations • NeurIPS 2012 • Aaron Defazio, Tibério S. Caetano

We consider the case where the structure of the graph to be reconstructed is known to be scale-free.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.