Abstract

Reinforcement learning studies a dynamic environment where the learner's actions influence the state of the environment, which in turn influences the future rewards of the learner.  The goal of the learner is to maximize its long-term reward.  The common model for reinforcement learning is Markov Decision Processes (MDPs).

I will give a short tutorial on reinforcement learning and MDPs.  (I will assume very little on the background of the audience.) I will (try) and cover the following topics:
1. Mathematical model of Markov Decision Processes (MDP)
2. Planning in MDP: computing an optimal policy
3. Learning in (unknown) MDP
4. Large (exponential) state MDP
5. Partially Observable MDP (time permitting).

This tutorial is intended to be interactive with the audience participation.

The second session of this mini course will take place on Wednesday, August 24th, 2016 4:30 pm – 5:15 pm.

Video Recording