Abstract

This talk focuses on the statistical sample complexity and model reduction of Markov decision process (MDP). We begin by surveying recent advances on the complexity for solving MDP, without any dimension reduction. In the first part we study the statistical state compression of general Markov processes. We propose a spectral state compression method for learning state features and aggregation structures from data. The state compression method is able to “sketch” a black-box Markov process from its empirical data, for which both minimax statistical guarantees and scalable computational tools are provided. In the second part, we propose a bilinear primal-dual pi learning method for finding the optimal policy, which utilizes given state and action features. The method is motivated from a saddle point formulation of the Bellman equation. Its sample complexity depends only on the number of parameters and is variant with respect to the dimension of the problem, making high-dimensional reinforcement learning possible using “small” data.