Adithya Devraj (Univeristy of Florida)
As you learned in the previous talk, stochastic approximation (SA) algorithms are prone to slow convergence, which was explained by the fact that the asymptotic (CLT) variance is typically infinite. This was one of the main motivations for the introduction of the Zap Q-learning algorithm: a two time-scale version of Stochastic Newton-Raphson that has optimal variance.
While the Zap algorithm enjoys fast convergence, each iteration involves a matrix inversion, which is obviously undesirable. This motivated us to look at alternatives, resulting in two new classes of algorithms:
(i) The PolSA algorithm that is based on Polyak’s momentum technique, but with a specialized matrix momentum, and
(ii) The NeSA algorithm based on Nesterov’s acceleration technique.
Analysis requires entirely new analytic techniques, since the algorithms fall far outside the classical SA framework of Robbins and Monro. An approach taken in our new work is via coupling: conditions are established under which the parameter estimates obtained using the PolSA algorithm couple with those obtained using a Newton-Raphson SA algorithm (a cousin of Zap).