no code implementations • 25 Oct 2023 • Matthieu Jonckheere, Chiara Mignacco, Gilles Stoltz
Structured reinforcement learning leverages policies with advantageous properties to reach better performance, particularly in scenarios where exploration poses challenges.
no code implementations • 31 May 2023 • Evgenii Chzhen, Christophe Giraud, Gilles Stoltz
We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD).
no code implementations • 30 Sep 2022 • Antoine Barrier, Aurélien Garivier, Gilles Stoltz
All these new upper and lower bounds generalize existing bounds based, e. g., on gaps between distributions.
no code implementations • 1 Jun 2022 • Zhen Li, Gilles Stoltz
At each round, given the stochastic i. i. d.\ context $\mathbf{x}_t$ and the arm picked $a_t$ (corresponding, e. g., to a discount level), a customer conversion may be obtained, in which case a reward $r(a,\mathbf{x}_t)$ is gained and vector costs $c(a_t,\mathbf{x}_t)$ are suffered (corresponding, e. g., to losses of earnings).
no code implementations • NeurIPS 2021 • Evgenii Chzhen, Christophe Giraud, Gilles Stoltz
We provide a setting and a general approach to fair online learning with stochastic sensitive and non-sensitive contexts.
no code implementations • 5 Oct 2020 • Hédi Hadiji, Sébastien Gerchinovitz, Jean-Michel Loubes, Gilles Stoltz
We consider the bandit-based framework for diversity-preserving recommendations introduced by Celis et al. (2019), who approached it in the case of a polytope mainly by a reduction to the setting of linear bandits.
no code implementations • 5 Jun 2020 • Hédi Hadiji, Gilles Stoltz
We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m, M]$.
no code implementations • 5 Jun 2020 • Malo Huard, Rémy Garnier, Gilles Stoltz
We revisit the interest of classical statistical techniques for sales forecasting like exponential smoothing and extensions thereof (as Holt's linear trend method).
no code implementations • 28 Jan 2019 • Margaux Brégère, Pierre Gaillard, Yannig Goude, Gilles Stoltz
We propose a contextual-bandit approach for demand side management by offering price incentives.
no code implementations • 30 Nov 2018 • Raphaël Deswarte, Véronique Gervais, Gilles Stoltz, Sébastien da Veiga
An extension of the deterministic aggregation approach is thus proposed in this paper to provide such multi-step-ahead forecasts.
no code implementations • 29 May 2018 • Pierre Gaillard, Sébastien Gerchinovitz, Malo Huard, Gilles Stoltz
In the case of sequentially revealed features, we also derive an asymptotic regret bound of $d B^2 \ln T$ for any individual sequence of features and bounded observations.
1 code implementation • 14 May 2018 • Aurélien Garivier, Hédi Hadiji, Pierre Menard, Gilles Stoltz
We were able to obtain this non-parametric bi-optimality result while working hard to streamline the proofs (of previously known regret bounds and thus of the new analyses carried out); a second merit of the present contribution is therefore to provide a review of proofs of classical regret bounds for index-based strategies for $K$-armed stochastic bandits.
no code implementations • 23 Feb 2016 • Aurélien Garivier, Pierre Ménard, Gilles Stoltz
We revisit lower bounds on the regret in the case of multi-armed bandit problems.
no code implementations • 10 Feb 2014 • Shie Mannor, Vianney Perchet, Gilles Stoltz
We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.
no code implementations • 10 Feb 2014 • Pierre Gaillard, Gilles Stoltz, Tim van Erven
We study online aggregation of the predictions of experts, and first show new second-order regret bounds in the standard setting, which are obtained via a version of the Prod algorithm (and also a version of the polynomially weighted average algorithm) with multiple learning rates.
no code implementations • 23 May 2013 • Shie Mannor, Vianney Perchet, Gilles Stoltz
In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.
no code implementations • NeurIPS 2012 • Nicolò Cesa-Bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz
Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension.
no code implementations • NeurIPS 2008 • Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos
We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space.