Abstract

It is well-known that Q-learning can easily destabilize when using nonlinear function approximation.  Existing work alleviates instabilities by using double Q functions or simply using the min of two Q functions.   However, in the process of such stabilization there is also signal that's lost.  In this work we investigate Q-Learning with Weighted Bellman Losses that reflect uncertainty estimates on the target Q's.  Our experiments with SAC and Rainbow DQN show stable and faster learning.   Our approach is also easily augmented with UCB exploration to further speed up learning.

Attachment

Video Recording