Stochastic Approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions.  The most famous examples today are TD- and Q-learning algorithms.   This three hour tutorial lecture series will consist of two parts:

  1. The basics:  an overview of stochastic approximation, with a focus on optimizing the rate of convergence.   An algorithm that gives the best rate is analogous to the Newton-Raphson algorithm.
  2. Left-overs and review from part 1, and applications to reinforcement learning for discounted-cost optimal control, and optimal stopping problems.   Theory from Part 1 leads to the new Zap Q-learning algorithm. Analysis suggests that its transient behavior is a close match to a deterministic Newton-Raphson implementation, and numerical experiments confirm super fast convergence.

The first session of this mini course will take place on Wednesday, March 7, 2:30 – 4:00 pm; the second session will take place on Friday, March 9, 10:00 – 11:30 am. All talks will be recorded.

YouTube Video
Remote video URL