Online Preselection with Context Information under the Plackett-Luce Model

11 Feb 2020  ·  Adil El Mesaoudi-Paul, Viktor Bengs, Eyke Hüllermeier ·

We consider an extension of the contextual multi-armed bandit problem, in which, instead of selecting a single alternative (arm), a learner is supposed to make a preselection in the form of a subset of alternatives. More specifically, in each iteration, the learner is presented a set of arms and a context, both described in terms of feature vectors. The task of the learner is to preselect $k$ of these arms, among which a final choice is made in a second step. In our setup, we assume that each arm has a latent (context-dependent) utility, and that feedback on a preselection is produced according to a Plackett-Luce model. We propose the CPPL algorithm, which is inspired by the well-known UCB algorithm, and evaluate this algorithm on synthetic and real data. In particular, we consider an online algorithm selection scenario, which served as a main motivation of our problem setting. Here, an instance (which defines the context) from a certain problem class (such as SAT) can be solved by different algorithms (the arms), but only $k$ of these algorithms can actually be run.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here