Search Results for author: Asaf Cassel

Found 10 papers, 0 papers with code

Multi-turn Reinforcement Learning from Preference Human Feedback

no code implementations • 23 May 2024 • Lior Shani, Aviv Rosenberg, Asaf Cassel, Oran Lang, Daniele Calandriello, Avital Zipori, Hila Noga, Orgad Keller, Bilal Piot, Idan Szpektor, Avinatan Hassidim, Yossi Matias, Rémi Munos

Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models (LLMs) with human preferences, allowing LLMs to demonstrate remarkable abilities in various tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

no code implementations • 13 May 2024 • Asaf Cassel, Haipeng Luo, Aviv Rosenberg, Dmitry Sotnikov

In many real-world applications, it is hard to provide a reward signal in each step of a Reinforcement Learning (RL) process and more natural to give feedback when an episode ends.

Reinforcement Learning (RL)

Paper
Add Code

Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

no code implementations • 2 Mar 2023 • Orin Levy, Alon Cohen, Asaf Cassel, Yishay Mansour

To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.

regression

Paper
Add Code

Eluder-based Regret for Stochastic Contextual MDPs

no code implementations • 27 Nov 2022 • Orin Levy, Asaf Cassel, Alon Cohen, Yishay Mansour

To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs that operates under the general offline function approximation setting.

regression

Paper
Add Code

Rate-Optimal Online Convex Optimization in Adaptive Linear Control

no code implementations • 3 Jun 2022 • Asaf Cassel, Alon Cohen, Tomer Koren

We consider the problem of controlling an unknown linear dynamical system under adversarially changing convex costs and full feedback of both the state and cost function.

Paper
Add Code

Efficient Online Linear Control with Stochastic Convex Costs and Unknown Dynamics

no code implementations • 2 Mar 2022 • Asaf Cassel, Alon Cohen, Tomer Koren

We consider the problem of controlling an unknown linear dynamical system under a stochastic convex cost and full feedback of both the state and cost function.

Paper
Add Code

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt{T}$ Regret

no code implementations • 25 Feb 2021 • Asaf Cassel, Tomer Koren

We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem.

Paper
Add Code

Bandit Linear Control

no code implementations • NeurIPS 2020 • Asaf Cassel, Tomer Koren

We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback.

Paper
Add Code

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

no code implementations • ICML 2020 • Asaf Cassel, Alon Cohen, Tomer Koren

We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown.

Paper
Add Code

A General Framework for Bandit Problems Beyond Cumulative Objectives

no code implementations • 4 Jun 2018 • Asaf Cassel, Shie Mannor, Assaf Zeevi

Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial.

Multi-Armed Bandits

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.