Search Results for author: Moshe Gabel

Found 6 papers, 2 papers with code

FOSI: Hybrid First and Second Order Optimization

1 code implementation • 16 Feb 2023 • Hadar Sivan, Moshe Gabel, Assaf Schuster

Popular machine learning approaches forgo second-order information due to the difficulty of computing curvature in high dimensions.

Audio Classification Language Modelling +2

Paper
Code

Optimizing Data Collection in Deep Reinforcement Learning

no code implementations • 15 Jul 2022 • James Gleeson, Daniel Snider, Yvonne Yang, Moshe Gabel, Eyal de Lara, Gennady Pekhimenko

We show that simulator kernel fusion speedups with a simple simulator are $11. 3\times$ and increase by up to $1024\times$ as simulator complexity increases in terms of memory bandwidth requirements.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads

1 code implementation • 8 Feb 2021 • James Gleeson, Srivatsan Krishnan, Moshe Gabel, Vijay Janapa Reddi, Eyal de Lara, Gennady Pekhimenko

Deep reinforcement learning (RL) has made groundbreaking advancements in robotics, data center management and other applications.

Management reinforcement-learning +1

Paper
Code

It's Not What Machines Can Learn, It's What We Cannot Teach

no code implementations • ICML 2020 • Gal Yehuda, Moshe Gabel, Assaf Schuster

Can deep neural networks learn to solve any task, and in particular problems of high complexity?

Traveling Salesman Problem

Paper
Add Code

Taming Momentum in a Distributed Asynchronous Environment

no code implementations • 26 Jul 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster

We propose DANA: a novel technique for asynchronous distributed SGD with momentum that mitigates gradient staleness by computing the gradient on an estimated future position of the model's parameters.

16k Distributed Computing

Paper
Add Code

DANA: Scalable Out-of-the-box Distributed ASGD Without Retuning

no code implementations • ICLR 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster

We propose DANA, a novel approach that scales out-of-the-box to large clusters using the same hyperparameters and learning schedule optimized for training on a single worker, while maintaining similar final accuracy without additional overhead.

Distributed Computing

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.