Reinforcement Learning and Markov Decision Processes

Lecture 1: Reinforcement Learning and Markov Decision Processes I 
Lecture 2: Reinforcement Learning and Markov Decision Processes II 

This series of talks is part of the Algorithms and Uncertainty Boot Camp. Videos for each talk area will be available through the links above.

Speaker: Yishay Mansour, Tel Aviv University

Reinforcement learning studies a dynamic environment where the learner's actions influence the state of the environment, which in turn influences the future rewards of the learner.  The goal of the learner is to maximize its long-term reward.  The common model for reinforcement learning is Markov Decision Processes (MDPs).

I will give a short tutorial on reinforcement learning and MDPs.  (I will assume very little on the background of the audience.) I will (try) and cover the following topics:
1. Mathematical model of Markov Decision Processes (MDP)
2. Planning in MDP: computing an optimal policy
3. Learning in (unknown) MDP
4. Large (exponential) state MDP
5. Partially Observable MDP (time permitting).

This tutorial is intended to be interactive with the audience participation.