Search Results for author: Evrard Garcelon

Found 13 papers, 0 papers with code

Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

no code implementations • 24 Dec 2023 • Paul Daoudi, Mathias Formoso, Othman Gaizi, Achraf Azize, Evrard Garcelon

A precondition for the deployment of a Reinforcement Learning agent to a real-world system is to provide guarantees on the learning process.

Paper
Add Code

SALSA PICANTE: a machine learning attack on LWE with binary secrets

no code implementations • 7 Mar 2023 • Cathy Li, Jana Sotáková, Emily Wenger, Mohamed Malhou, Evrard Garcelon, Francois Charton, Kristin Lauter

However, this attack assumes access to millions of eavesdropped LWE samples and fails at higher Hamming weights or dimensions.

Math

Paper
Add Code

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

no code implementations • 13 Dec 2021 • Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric, Matteo Pirotta

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds.

Paper
Add Code

Privacy Amplification via Shuffling for Linear Contextual Bandits

no code implementations • 11 Dec 2021 • Evrard Garcelon, Kamalika Chaudhuri, Vianney Perchet, Matteo Pirotta

Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected.

Multi-Armed Bandits

Paper
Add Code

Differentially Private Exploration in Reinforcement Learning with Linear Representation

no code implementations • 2 Dec 2021 • Paul Luyo, Evrard Garcelon, Alessandro Lazaric, Matteo Pirotta

We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a. k. a.\ model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration.

Privacy Preserving reinforcement-learning +1

Paper
Add Code

A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning

no code implementations • ICLR 2022 • Yunchang Yang, Tianhao Wu, Han Zhong, Evrard Garcelon, Matteo Pirotta, Alessandro Lazaric, LiWei Wang, Simon S. Du

We also obtain a new upper bound for conservative low-rank MDP.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

Encrypted Linear Contextual Bandit

no code implementations • 17 Mar 2021 • Evrard Garcelon, Vianney Perchet, Matteo Pirotta

A critical aspect of bandit methods is that they require to observe the contexts --i. e., individual or group-level data-- and rewards in order to solve the sequential problem.

Decision Making Multi-Armed Bandits +2

Paper
Add Code

Local Differential Privacy for Regret Minimization in Reinforcement Learning

no code implementations • NeurIPS 2021 • Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta

Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Adversarial Attacks on Linear Contextual Bandits

no code implementations • NeurIPS 2020 • Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech, Olivier Teytaud, Alessandro Lazaric, Matteo Pirotta

In many of these domains, malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior.

Multi-Armed Bandits Recommendation Systems

Paper
Add Code

Improved Algorithms for Conservative Exploration in Bandits

no code implementations • 8 Feb 2020 • Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

In this case, it is desirable to deploy online learning algorithms (e. g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself.

Marketing Recommendation Systems

Paper
Add Code

Conservative Exploration in Reinforcement Learning

no code implementations • 8 Feb 2020 • Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

No-Regret Exploration in Goal-Oriented Reinforcement Learning

no code implementations • ICML 2020 • Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

Many popular reinforcement learning problems (e. g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost.

Atari Games reinforcement-learning +1

Paper
Add Code

Bandits with Side Observations: Bounded vs. Logarithmic Regret

no code implementations • 10 Jul 2018 • Rémy Degenne, Evrard Garcelon, Vianney Perchet

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.