We consider a variant of social learning in which the agents collectively implement a multi-armed bandit algorithm. Agents arrive sequentially and act once, each agent observing the actions and rewards of its "friends" in a social network that arrived earlier. The reward for a given action is an i.i.d. draw from some fixed but unknown distribution; there are no private signals. The agents as a group face exploration-exploitation tradeoff, but their self-interested behavior causes learning failures such that many/most choose a suboptimal action. The severity of these failures depends on the network and agents' behavioral model. Even the basic model, with a complete network and Bayesian-rational agents, is surprisingly not well-understood.

The main purpose of this talk is to advertise the problem space. We survey the fundamental results, outline connections/differences w.r.t. related literatures, and discuss some new work and open questions.

*Note: This talk will be livestreamed only and will not be recorded or posted on SimonsTV.*