Description

Title: Learning with (Bandit) Feedback from Strategic Stakeholders for Fair Resource Allocation

Abstract: In many sequential learning and decision-making settings, a policy needs to rely on feedback from strategic stakeholders, who themselves have vested interest in the decisions taken by a policy. One such use case arises when we need to fairly allocate resources among users in a shared compute cluster. Since users often do not know their resource requirements a priori, this needs to be estimated via feedback from these users.

I will present some of our recent work in this setting where we strive to design bandit algorithms that are efficient (finds Pareto-efficient outcomes), fair (treats all users fairly), and strategy-proof (a user's best strategy is to report their feedback truthfully) when users are not aware of their utilities (resource requirements). I will discuss algorithms, asymptotic upper bounds on the three criteria, and some hardness results. I will also present empirical results on a 1000 CPU, 20 user cluster where our methods are able to find efficient outcomes while empirically satisfying fairness and strategy-proofness.