Abstract

Recent work has investigated saddlepoint reformulations of objectives in RL, for both the mean-squared Bellman error (MSBE) and the mean-squared projected Bellman error (MSPBE). In this talk, I will discuss how this view provides a family of projected Bellman error objectives, that includes the MSBE and MSPBE as special cases. This view (1) more naturally enable extensions to nonlinear value estimation, and (2) provides insights into the two common strategies to sample the gradient of the MSPBE.

Video Recording