Playlist: 27 videos

Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

Remote video URL
0:49:11
Nika Haghtalab (UC Berkeley)
https://simons.berkeley.edu/talks/tbd-478
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

Robustness to changes in data is one of the main challenges faced by sequential machine learning and decision-making algorithms. Yet, most efficient and highly optimized deployed algorithms today were designed to work well on fixed data sets and ultimately fail when data evolves in unpredictable or adversarial ways. Rather than designing robust algorithms from scratch, in this talk, I will show how we can directly tap into existing non-robust optimization algorithms to design sequential learning algorithms that are robust to adversaries.

More formally, we will consider the design of online no-regret algorithms that are computationally efficient, assuming access to an offline Empirical Risk Minimization oracle. We will see general approaches and principles for designing such algorithms. I will introduce a general-purpose oracle-efficient algorithm that is no-regret whenever an adaptive adversary is not fully worst-case (in the style of Smoothed Analysis). I will also compare these results to the fully worst-case adversaries, where oracle-efficient algorithms do not exist in full generality, but they do exist for a large class of economic settings such as in online auction design.
Visit talk page
Remote video URL
0:46:0
Chen-Yu Wei (University of Southern California)
https://simons.berkeley.edu/talks/tbd-482
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

To evaluate the performance of a bandit learner in a changing environment, the standard notion of regret is insufficient. Instead, "dynamic regret" is a better measure that can evaluate the learner's ability to track the changes. How to achieve the optimal dynamic regret without prior knowledge on the number of times the environment changes had been open for a long time, and was recently resolved by Auer, Gajane, and Ortner in their COLT 2019 paper. We will discuss their consecutive sampling technique, which is rare in the bandit literature, and see how their idea can be elegantly generalized to a wide range of bandit/RL problems. Finally, we will discuss important open problems that remain in the area.
Visit talk page
Remote video URL
0:51:45
Csaba Szepesvári (University of Alberta, Google DeepMind)
https://simons.berkeley.edu/talks/tbd-483
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

At the dawn of the computer age in the 1960s, Bellman and his co-workers already found it useful to use linear function approximation to solve some multistage (or sequential) planning problems. Their approach was simple: Just use function approximation to avoid state-space discretization and thus keep all computation poly-time, while also controlling for accuracy. However, the question of when and how is this possible has eluded researchers for at least 50 years. The partial results obtained suggested that the approximation spaces used need to have an intricate relationship to the problem to be solved and it may not be sufficient that the target function (say, the optimal value function) lies in this space. In this talk I will give an overview of recent work in this area, which essentially closed most of the open questions in the simplest finite horizon setting. While the picture that emerges is interesting, most of the results are negative. The conclusion is that the approximation spaces indeed be better special, or generality needs to be sacrificed in some other way.
Visit talk page
Remote video URL
0:47:35
Daniel Russo (Columbia University)
https://simons.berkeley.edu/talks/tbd-484
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

Multi-armed bandit algorithms can offer enormous efficiency benefits in problems where learning to make effective decisions requires careful experimentation. They achieve this through adaptivity: early feedback is used to identify competitive parts of the decision space and future experimentation effort is focused there. Unfortunately, due to this adaptivity, these algorithms risk confounding in problems where nonstationary contexts influence performance. As a result, many practitioners resort to non-adaptive randomized experimentation, providing robustness but foregoing efficiency benefits. We develop a new model to study this issue and propose deconfounded Thompson sampling, which involves a simple modification to one of leading multi-armed bandit algorithms. We argue that this method strikes a delicate balance, allowing one to build in robustness to nonstationarity while, when possible, preserving the efficiency benefits of adaptivity.
Visit talk page
0:46:0
Emma Brunskill (Stanford University)
https://simons.berkeley.edu/talks/tbd-485
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

Almost all work in reinforcement learning focuses on one of two settings: the online case, where an algorithm can gather data and near immediately update its policy, and the offline case, where an algorithm must learn exclusively from existing historical data. Yet many applied areas, including education, healthcare and developmental economics, have a history of running experiments to assess the impact of different potential interventions, for the purpose of selecting decision policies for future wider-scale deployment. We can view this as the batch or limited deployment setting, where one or more explicit experiments may be conducted, in order to gather information to identify near optimal policies. Towards progress for these settings, we introduce the first, to our knowledge, algorithm for doing this for linear contextual multi-armed bandits where the aim is to minimize the sample complexity needed to learn a near-optimal policy. We provide theoretical bounds and encouraging empirical results, even when the domain violates the linearity assumptions. This is joint work with Andrea Zanette, Kefan Dong, Jonathan Lee and Matt Joerke.
Visit talk page
Remote video URL
0:53:30
Wen Sun (Cornell University)
https://simons.berkeley.edu/talks/tbd-486
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

Offline Reinforcement Learning (RL) is a learning paradigm where the RL agent only learns from a pre-collected static dataset and cannot further interact with the environment anymore. Offline RL is a promising approach for safety-critical applications where randomized exploration is not safe. In this talk, we study offline RL in large scale settings with rich function approximation. In the first part of the talk, we will study the generalization property in offline RL and we will give a general model-based offline RL algorithm that provably generalizes in large scale Markov Decision Processes. Our approach is also robust in the sense that as long as there is a high-quality policy whose traces are covered by the offline data, our algorithm will find it. In the second part of the talk, we consider the offline Imitation Learning (IL) setting where the RL agent has an additional set of high-quality expert demonstrations. In this setting, we give an IL algorithm that learns with polynomial sample complexity and achieves start-of-art performance in standard continuous control robotics benchmark.
Visit talk page