no code implementations • 9 Oct 2019 • Victor Gabillon, Rasul Tutunov, Michal Valko, Haitham Bou Ammar
In this paper, we formalise order-robust optimisation as an instance of online learning minimising simple regret, and propose Vroom, a zero'th order optimisation algorithm capable of achieving vanishing regret in non-stationary environments, while recovering favorable rates under stochastic reward-generating processes.
no code implementations • 3 Sep 2019 • Vasco Lopes, Fabio Maria Carlucci, Pedro M Esperança, Marco Singh, Victor Gabillon, Antoine Yang, Hang Xu, Zewei Chen, Jun Wang
The Neural Architecture Search (NAS) problem is typically formulated as a graph search problem where the goal is to learn the optimal operations over edges in order to maximise a graph-level global objective.
no code implementations • 1 Oct 2018 • Peter L. Bartlett, Victor Gabillon, Michal Valko
The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} $b$ of the function evaluation and 2) the local smoothness, $d$, of the function.
no code implementations • NeurIPS 2017 • Yasin Abbasi, Peter L. Bartlett, Victor Gabillon
We study minimax strategies for the online prediction problem with expert advice.
no code implementations • 19 Oct 2016 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek
We propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces.
no code implementations • NeurIPS 2013 • Victor Gabillon, Mohammad Ghavamzadeh, Bruno Scherrer
A close look at the literature of this game shows that while ADP algorithms, that have been (almost) entirely based on approximating the value function (value function based), have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results.
no code implementations • NeurIPS 2013 • Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan
Maximization of submodular functions has wide applications in machine learning and artificial intelligence.
no code implementations • NeurIPS 2012 • Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric
We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting.
no code implementations • 14 May 2012 • Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods.
no code implementations • NeurIPS 2011 • Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck
We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i. e., small gap).