Abstract

We establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Using prior work on the asymptotic mean-squared error of linear stochastic approximation based on Lyapunov equations, we show that the asymptotic mean-squared error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and outputs the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.

Attachment

Video Recording