CCSP Seminar: K. A. Sankararaman, "Adversarial Bandits with Knapsacks"

Thursday, March 14, 2019
5:00 p.m.-6:30 p.m.
AVW 2168
Ajaykrishnan Nageswaran
301 405 3661

Communication, Control and Signal Processing Seminar

Adversarial Bandits with Knapsacks

Karthik Abinav Sankararaman
University of Maryland

In this talk, we will discuss the multi-armed bandits problem with resource constraints, in the adversarial setting. In this problem, we have an interactive and repeated game between the algorithm and an adversary. Given $T$ time-steps, $d$ resources, $m$ actions and budgets $B_1, B_2,\ldots B_d$, the algorithm chooses one of the $m$ actions at each time-step. An adversary then reveals a reward and consumption for each of the $d$ resources, corresponding to this action. The time-step at which the algorithm runs out of the $d$ resources (i.e., the total consumption for resource $j > B_j$), the game stops and the total reward is the sum of rewards obtained until the stopping time. The goal is to maximize the competitive ratio; the ratio of the total reward of the algorithm to the expected reward of a fixed distribution that knows the rewards and consumption for all actions and time-steps beforehand. We give an algorithm for this problem whose competitive ratio is tight (matches the lower-bound). The algorithm achieves the optimal bound, both in expectation and with high-probability. Moreover, the algorithmic framework is modular and can be used in a black-box manner to obtain algorithms to many related problems such as the stochastic, contextual, semi-bandit settings as well as to the more general online constrained convex optimization setting.

The talk is based on this paper.

Audience: Graduate  Undergraduate  Faculty  Post-Docs 


February 2020

26 27 28 29 30 31 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
1 2 3 4 5 6 7
Submit an Event