Playlist: 22 videos
Boot Camps, Seminars, Courses, Tutorials
Data-Driven Decision Processes Boot Camp
1:9:10
Shipra Agrawal (Columbia University)
https://simons.berkeley.edu/talks/stochastic-bandits-foundations-and-current-perspectives
Data-Driven Decision Processes Boot Camp
This talk will focus on the main algorithms for stochastic bandits, a fundamental model for sequential learning that assumes that rewards of different actions come identically and independently from fixed distributions. We will cover the main algorithms for stochastic bandits (Upper Confidence Bound and Thompson Sampling) and subsequently discuss how they can be adapted to incorporate various additional constraints.
Visit talk page
https://simons.berkeley.edu/talks/stochastic-bandits-foundations-and-current-perspectives
Data-Driven Decision Processes Boot Camp
This talk will focus on the main algorithms for stochastic bandits, a fundamental model for sequential learning that assumes that rewards of different actions come identically and independently from fixed distributions. We will cover the main algorithms for stochastic bandits (Upper Confidence Bound and Thompson Sampling) and subsequently discuss how they can be adapted to incorporate various additional constraints.
1:1:26
Shipra Agrawal (Columbia University)
https://simons.berkeley.edu/talks/stochastic-bandits-foundations-and-current-perspectives-0
Data-Driven Decision Processes Boot Camp
This talk will focus on the main algorithms for stochastic bandits, a fundamental model for sequential learning that assumes that rewards of different actions come identically and independently from fixed distributions. We will cover the main algorithms for stochastic bandits (Upper Confidence Bound and Thompson Sampling) and subsequently discuss how they can be adapted to incorporate various additional constraints.
Visit talk page
https://simons.berkeley.edu/talks/stochastic-bandits-foundations-and-current-perspectives-0
Data-Driven Decision Processes Boot Camp
This talk will focus on the main algorithms for stochastic bandits, a fundamental model for sequential learning that assumes that rewards of different actions come identically and independently from fixed distributions. We will cover the main algorithms for stochastic bandits (Upper Confidence Bound and Thompson Sampling) and subsequently discuss how they can be adapted to incorporate various additional constraints.
1:21:25
Haipeng Luo (USC)
https://simons.berkeley.edu/talks/adversarial-bandits-theory-and-algorithms
Data-Driven Decision Processes Boot Camp
The adversarial (a.k.a. non-stochastic) multi-armed bandit problem is an influential marriage between the online learning literature that concerns sequential decision making without distributional assumptions and the bandit literature that concerns learning with partial information feedback. This tutorial will give an overview of the theory and algorithms on this topic, starting from classical algorithms and their analysis and then moving on to advances in recent years on data-dependent regret guarantees, structural bandits, bandit with switching costs, combining bandit algorithms, and others. Special focus will be given to highlighting the similarities and differences between online learning with full-information feedback and that with bandit feedback.
Visit talk page
https://simons.berkeley.edu/talks/adversarial-bandits-theory-and-algorithms
Data-Driven Decision Processes Boot Camp
The adversarial (a.k.a. non-stochastic) multi-armed bandit problem is an influential marriage between the online learning literature that concerns sequential decision making without distributional assumptions and the bandit literature that concerns learning with partial information feedback. This tutorial will give an overview of the theory and algorithms on this topic, starting from classical algorithms and their analysis and then moving on to advances in recent years on data-dependent regret guarantees, structural bandits, bandit with switching costs, combining bandit algorithms, and others. Special focus will be given to highlighting the similarities and differences between online learning with full-information feedback and that with bandit feedback.
1:15:10
Thodoris Lykouris (MIT)
https://simons.berkeley.edu/talks/bridging-stochastic-and-adversarial-bandits
Data-Driven Decision Processes Boot Camp
As discussed in the previous talks, the main paradigms for online learning with partial information posit either that the reward distribution is i.i.d. across rounds (stochastic bandits) or completely arbitrary (adversarial bandits). This tutorial will focus on recent developments in hybrid models and corresponding algorithms that aim to shed light on the space in between those two extremes.
Visit talk page
https://simons.berkeley.edu/talks/bridging-stochastic-and-adversarial-bandits
Data-Driven Decision Processes Boot Camp
As discussed in the previous talks, the main paradigms for online learning with partial information posit either that the reward distribution is i.i.d. across rounds (stochastic bandits) or completely arbitrary (adversarial bandits). This tutorial will focus on recent developments in hybrid models and corresponding algorithms that aim to shed light on the space in between those two extremes.
0:57:15
Weina Wang (Carnegie Mellon University)
https://simons.berkeley.edu/talks/fundamentals-markov-decision-processes
Data-Driven Decision Processes Boot Camp
This part of the tutorial covers the fundamentals of Markov decision processes, providing a frame for the discussion of reinforcement learning in the next two parts. We will present the Bellman equation and its key properties, and show how they give rise to the commonly used computational methods such as value iteration and policy iteration.
Visit talk page
https://simons.berkeley.edu/talks/fundamentals-markov-decision-processes
Data-Driven Decision Processes Boot Camp
This part of the tutorial covers the fundamentals of Markov decision processes, providing a frame for the discussion of reinforcement learning in the next two parts. We will present the Bellman equation and its key properties, and show how they give rise to the commonly used computational methods such as value iteration and policy iteration.
1:5:0
Rayadurgam Srikant (University of Illinois Urbana-Champaign)
https://simons.berkeley.edu/talks/foundations-rl
Data-Driven Decision Processes Boot Camp
This part of the tutorial will build upon the first part of the tutorial on MDPs and focus on reinforcement learning. We will present three algorithms: TD learning, Q-Learning and Natural Policy Gradient, and outline the key ideas behind obtaining finite-time performance bounds for each of these algorithms.
Visit talk page
https://simons.berkeley.edu/talks/foundations-rl
Data-Driven Decision Processes Boot Camp
This part of the tutorial will build upon the first part of the tutorial on MDPs and focus on reinforcement learning. We will present three algorithms: TD learning, Q-Learning and Natural Policy Gradient, and outline the key ideas behind obtaining finite-time performance bounds for each of these algorithms.
1:10:55
Rayadurgam Srikant (University of Illinois Urbana-Champaign)
https://simons.berkeley.edu/talks/foundations-rl-0
Data-Driven Decision Processes Boot Camp
This part of the tutorial will build upon the first part of the tutorial on MDPs and focus on reinforcement learning. We will present three algorithms: TD learning, Q-Learning and Natural Policy Gradient, and outline the key ideas behind obtaining finite-time performance bounds for each of these algorithms.
Visit talk page
https://simons.berkeley.edu/talks/foundations-rl-0
Data-Driven Decision Processes Boot Camp
This part of the tutorial will build upon the first part of the tutorial on MDPs and focus on reinforcement learning. We will present three algorithms: TD learning, Q-Learning and Natural Policy Gradient, and outline the key ideas behind obtaining finite-time performance bounds for each of these algorithms.
1:16:35
Christina Lee Yu (Cornell University) & Sean Sinclair (Cornell)
https://simons.berkeley.edu/talks/online-reinforcement-learning-and-regret
Data-Driven Decision Processes Boot Camp
This tutorial will focus on the online learning perspective towards reinforcement learning when the model is unknown, and one incurs regret for actions selected during the learning process itself. Building on the preceding talks, as well as yesterday's tutorials on multi-arm bandits, we will focus on the challenges introduced in analysing regret under the Markovian dynamics. We will also discuss the interaction between learning and function approximation, the role of structure, and existing challenges and open problems.
Visit talk page
https://simons.berkeley.edu/talks/online-reinforcement-learning-and-regret
Data-Driven Decision Processes Boot Camp
This tutorial will focus on the online learning perspective towards reinforcement learning when the model is unknown, and one incurs regret for actions selected during the learning process itself. Building on the preceding talks, as well as yesterday's tutorials on multi-arm bandits, we will focus on the challenges introduced in analysing regret under the Markovian dynamics. We will also discuss the interaction between learning and function approximation, the role of structure, and existing challenges and open problems.
0:56:0
Anupam Gupta (Carnegie Mellon University)
https://simons.berkeley.edu/talks/competitive-analysis-online-algorithms-parts-1-and-2
Data-Driven Decision Processes Boot Camp
The study of online algorithms forms a vibrant area of research in computer science, considering decision-making under uncertainty. One of the two dominant frameworks is that of competitive analysis. This framework compares the performance of an online algorithm making decisions (without knowledge of the future) to the performance of the best (dynamic) sequence of decisions in hindsight. (This is as opposed to the best static decision.) In these two talks, I will define the model, and survey some representative problems, solution techniques, and proof strategies used in the competitive analysis of online algorithms.
Visit talk page
https://simons.berkeley.edu/talks/competitive-analysis-online-algorithms-parts-1-and-2
Data-Driven Decision Processes Boot Camp
The study of online algorithms forms a vibrant area of research in computer science, considering decision-making under uncertainty. One of the two dominant frameworks is that of competitive analysis. This framework compares the performance of an online algorithm making decisions (without knowledge of the future) to the performance of the best (dynamic) sequence of decisions in hindsight. (This is as opposed to the best static decision.) In these two talks, I will define the model, and survey some representative problems, solution techniques, and proof strategies used in the competitive analysis of online algorithms.
1:3:51
Anupam Gupta (Carnegie Mellon University)
https://simons.berkeley.edu/talks/competitive-analysis-online-algorithms-parts-1-and-2-0
Data-Driven Decision Processes Boot Camp
The study of online algorithms forms a vibrant area of research in computer science, considering decision-making under uncertainty. One of the two dominant frameworks is that of competitive analysis. This framework compares the performance of an online algorithm making decisions (without knowledge of the future) to the performance of the best (dynamic) sequence of decisions in hindsight. (This is as opposed to the best static decision.) In these two talks, I will define the model, and survey some representative problems, solution techniques, and proof strategies used in the competitive analysis of online algorithms.
Visit talk page
https://simons.berkeley.edu/talks/competitive-analysis-online-algorithms-parts-1-and-2-0
Data-Driven Decision Processes Boot Camp
The study of online algorithms forms a vibrant area of research in computer science, considering decision-making under uncertainty. One of the two dominant frameworks is that of competitive analysis. This framework compares the performance of an online algorithm making decisions (without knowledge of the future) to the performance of the best (dynamic) sequence of decisions in hindsight. (This is as opposed to the best static decision.) In these two talks, I will define the model, and survey some representative problems, solution techniques, and proof strategies used in the competitive analysis of online algorithms.