Search Results for author: Gilles Stoltz

Found 18 papers, 1 papers with code

Symphony of experts: orchestration with adversarial insights in reinforcement learning

no code implementations • 25 Oct 2023 • Matthieu Jonckheere, Chiara Mignacco, Gilles Stoltz

Structured reinforcement learning leverages policies with advantageous properties to reach better performance, particularly in scenarios where exploration poses challenges.

Decision Making reinforcement-learning

Paper
Add Code

Parameter-free projected gradient descent

no code implementations • 31 May 2023 • Evgenii Chzhen, Christophe Giraud, Gilles Stoltz

We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD).

Stochastic Optimization

Paper
Add Code

On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

no code implementations • 30 Sep 2022 • Antoine Barrier, Aurélien Garivier, Gilles Stoltz

All these new upper and lower bounds generalize existing bounds based, e. g., on gaps between distributions.

Multi-Armed Bandits

Paper
Add Code

Contextual Bandits with Knapsacks for a Conversion Model

no code implementations • 1 Jun 2022 • Zhen Li, Gilles Stoltz

At each round, given the stochastic i. i. d.\ context $\mathbf{x}_t$ and the arm picked $a_t$ (corresponding, e. g., to a discount level), a customer conversion may be obtained, in which case a reward $r(a,\mathbf{x}_t)$ is gained and vector costs $c(a_t,\mathbf{x}_t)$ are suffered (corresponding, e. g., to losses of earnings).

Multi-Armed Bandits

Paper
Add Code

A Unified Approach to Fair Online Learning via Blackwell Approachability

no code implementations • NeurIPS 2021 • Evgenii Chzhen, Christophe Giraud, Gilles Stoltz

We provide a setting and a general approach to fair online learning with stochastic sensitive and non-sensitive contexts.

Fairness

Paper
Add Code

Diversity-Preserving K-Armed Bandits, Revisited

no code implementations • 5 Oct 2020 • Hédi Hadiji, Sébastien Gerchinovitz, Jean-Michel Loubes, Gilles Stoltz

We consider the bandit-based framework for diversity-preserving recommendations introduced by Celis et al. (2019), who approached it in the case of a polytope mainly by a reduction to the setting of linear bandits.

Paper
Add Code

Adaptation to the Range in $K$-Armed Bandits

no code implementations • 5 Jun 2020 • Hédi Hadiji, Gilles Stoltz

We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m, M]$.

Paper
Add Code

Hierarchical robust aggregation of sales forecasts at aggregated levels in e-commerce, based on exponential smoothing and Holt's linear trend method

no code implementations • 5 Jun 2020 • Malo Huard, Rémy Garnier, Gilles Stoltz

We revisit the interest of classical statistical techniques for sales forecasting like exponential smoothing and extensions thereof (as Holt's linear trend method).

Learning Theory

Paper
Add Code

Target Tracking for Contextual Bandits: Application to Demand Side Management

no code implementations • 28 Jan 2019 • Margaux Brégère, Pierre Gaillard, Yannig Goude, Gilles Stoltz

We propose a contextual-bandit approach for demand side management by offering price incentives.

Management Multi-Armed Bandits

Paper
Add Code

Sequential model aggregation for production forecasting

no code implementations • 30 Nov 2018 • Raphaël Deswarte, Véronique Gervais, Gilles Stoltz, Sébastien da Veiga

An extension of the deterministic aggregation approach is thus proposed in this paper to provide such multi-step-ahead forecasts.

regression

Paper
Add Code

Uniform regret bounds over $R^d$ for the sequential linear regression problem with the square loss

no code implementations • 29 May 2018 • Pierre Gaillard, Sébastien Gerchinovitz, Malo Huard, Gilles Stoltz

In the case of sequentially revealed features, we also derive an asymptotic regret bound of $d B^2 \ln T$ for any individual sequence of features and bounded observations.

regression

Paper
Add Code

KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

1 code implementation • 14 May 2018 • Aurélien Garivier, Hédi Hadiji, Pierre Menard, Gilles Stoltz

We were able to obtain this non-parametric bi-optimality result while working hard to streamline the proofs (of previously known regret bounds and thus of the new analyses carried out); a second merit of the present contribution is therefore to provide a review of proofs of classical regret bounds for index-based strategies for $K$-armed stochastic bandits.

376

Paper
Code

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

no code implementations • 23 Feb 2016 • Aurélien Garivier, Pierre Ménard, Gilles Stoltz

We revisit lower bounds on the regret in the case of multi-armed bandit problems.

Paper
Add Code

Approachability in unknown games: Online learning meets multi-objective optimization

no code implementations • 10 Feb 2014 • Shie Mannor, Vianney Perchet, Gilles Stoltz

We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.

Paper
Add Code

A Second-order Bound with Excess Losses

no code implementations • 10 Feb 2014 • Pierre Gaillard, Gilles Stoltz, Tim van Erven

We study online aggregation of the predictions of experts, and first show new second-order regret bounds in the standard setting, which are obtained via a version of the Prod algorithm (and also a version of the polynomially weighted average algorithm) with multiple learning rates.

Paper
Add Code

A Primal Condition for Approachability with Partial Monitoring

no code implementations • 23 May 2013 • Shie Mannor, Vianney Perchet, Gilles Stoltz

In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.

Paper
Add Code

Mirror Descent Meets Fixed Share (and feels no regret)

no code implementations • NeurIPS 2012 • Nicolò Cesa-Bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension.

Paper
Add Code

Online Optimization in X-Armed Bandits

no code implementations • NeurIPS 2008 • Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.