no code implementations • 28 Dec 2023 • Yossi Arjevani
The theoretical results, stated and proved for o-minimal structures, show that the set comprising all tangency arcs is topologically sufficiently tame to enable a numerical construction of tangency arcs and so compare how minima, both types, are positioned relative to adjacent critical points.
no code implementations • 13 Jun 2023 • Yossi Arjevani, Gal Vinograd
Use is made of the rich symmetry structure to construct infinite families of critical points represented by Puiseux series in the problem dimension, and so obtain precise analytic estimates on the value of the objective function and the Hessian spectrum.
no code implementations • 12 Oct 2022 • Yossi Arjevani, Michael Field
We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss, where labels are generated by a target network.
no code implementations • NeurIPS 2021 • Yossi Arjevani, Michael Field
In particular, we derive analytic estimates for the loss at different minima, and prove that modulo $O(d^{-1/2})$-terms the Hessian spectrum concentrates near small positive constants, with the exception of $\Theta(d)$ eigenvalues which grow linearly with~$d$.
no code implementations • 6 Jul 2021 • Yossi Arjevani, Michael Field
Motivated by questions originating from the study of a class of shallow student-teacher neural networks, methods are developed for the analysis of spurious minima in classes of gradient equivariant dynamics related to neural nets.
no code implementations • 10 Mar 2021 • Yossi Arjevani, Joan Bruna, Michael Field, Joe Kileel, Matthew Trager, Francis Williams
In this note, we consider the highly nonconvex optimization problem associated with computing the rank decomposition of symmetric tensors.
no code implementations • NeurIPS 2020 • Yossi Arjevani, Michael Field
We consider the optimization problem associated with fitting two-layers ReLU networks with respect to the squared loss, where labels are generated by a target network.
no code implementations • 24 Jun 2020 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan
We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed.
no code implementations • NeurIPS 2020 • Yossi Arjevani, Joan Bruna, Bugra Can, Mert Gürbüzbalaban, Stefanie Jegelka, Hongzhou Lin
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
no code implementations • 23 Mar 2020 • Yossi Arjevani, Michael Field
We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ hidden neurons, where labels are assumed to be generated by a (teacher) neural network.
no code implementations • 9 Feb 2020 • Yossi Arjevani, Amit Daniely, Stefanie Jegelka, Hongzhou Lin
Recent advances in randomized incremental methods for minimizing $L$-smooth $\mu$-strongly convex finite sums have culminated in tight complexity of $\tilde{O}((n+\sqrt{n L/\mu})\log(1/\epsilon))$ and $O(n+\sqrt{nL/\epsilon})$, where $\mu>0$ and $\mu=0$, respectively, and $n$ denotes the number of individual functions.
no code implementations • 26 Dec 2019 • Yossi Arjevani, Michael Field
We consider the optimization problem associated with fitting two-layer ReLU networks with respect to the squared loss, where labels are assumed to be generated by a target network.
no code implementations • 5 Dec 2019 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth
We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.
no code implementations • 26 Jun 2018 • Yossi Arjevani, Ohad Shamir, Nathan Srebro
We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $\tau$ rounds ago.
no code implementations • NeurIPS 2017 • Yossi Arjevani
We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sums problems.
no code implementations • NeurIPS 2017 • Yossi Arjevani
We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems.
no code implementations • ICML 2017 • Yossi Arjevani, Ohad Shamir
Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations.
no code implementations • NeurIPS 2016 • Yossi Arjevani, Ohad Shamir
Many canonical machine learning problems boil down to a convex optimization problem with a finite sum structure.
no code implementations • 11 May 2016 • Yossi Arjevani, Ohad Shamir
We consider a broad class of first-order optimization algorithms which are \emph{oblivious}, in the sense that their step sizes are scheduled regardless of the function under consideration, except for limited side-information such as smoothness or strong convexity parameters.
no code implementations • NeurIPS 2015 • Yossi Arjevani, Ohad Shamir
We study the fundamental limits to communication-efficient distributed methods for convex learning and optimization, under different assumptions on the information available to individual machines, and the types of functions considered.
no code implementations • 23 Mar 2015 • Yossi Arjevani, Shai Shalev-Shwartz, Ohad Shamir
This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereby new lower and upper bounds are derived.
no code implementations • 23 Oct 2014 • Yossi Arjevani
In this thesis we develop a novel framework to study smooth and strongly convex optimization algorithms, both deterministic and stochastic.