no code implementations • 23 May 2024 • Lior Shani, Aviv Rosenberg, Asaf Cassel, Oran Lang, Daniele Calandriello, Avital Zipori, Hila Noga, Orgad Keller, Bilal Piot, Idan Szpektor, Avinatan Hassidim, Yossi Matias, Rémi Munos
Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models (LLMs) with human preferences, allowing LLMs to demonstrate remarkable abilities in various tasks.
no code implementations • 13 May 2024 • Asaf Cassel, Haipeng Luo, Aviv Rosenberg, Dmitry Sotnikov
In many real-world applications, it is hard to provide a reward signal in each step of a Reinforcement Learning (RL) process and more natural to give feedback when an episode ends.
no code implementations • 2 Mar 2023 • Orin Levy, Alon Cohen, Asaf Cassel, Yishay Mansour
To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.
no code implementations • 27 Nov 2022 • Orin Levy, Asaf Cassel, Alon Cohen, Yishay Mansour
To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs that operates under the general offline function approximation setting.
no code implementations • 3 Jun 2022 • Asaf Cassel, Alon Cohen, Tomer Koren
We consider the problem of controlling an unknown linear dynamical system under adversarially changing convex costs and full feedback of both the state and cost function.
no code implementations • 2 Mar 2022 • Asaf Cassel, Alon Cohen, Tomer Koren
We consider the problem of controlling an unknown linear dynamical system under a stochastic convex cost and full feedback of both the state and cost function.
no code implementations • 25 Feb 2021 • Asaf Cassel, Tomer Koren
We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem.
no code implementations • NeurIPS 2020 • Asaf Cassel, Tomer Koren
We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback.
no code implementations • ICML 2020 • Asaf Cassel, Alon Cohen, Tomer Koren
We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown.
no code implementations • 4 Jun 2018 • Asaf Cassel, Shie Mannor, Assaf Zeevi
Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial.