Reinforcement Learning without Reinforcement (or Depth!)
Reinforcement Learning is concerned with solving sequential decision-making problems in the presence of uncertainty. Ideally, even a model is not assumed. Often, such algorithms are stochastic approximation schemes that provably converge almost surely but known to be rather slow. And deep neural networks-based function approximation is combined with such stochastic approximation schemes for various applications though provide no performance guarantees. In this talk, I will begin with the premise that asymptotic convergence to the optimum is an unnecessary goal for algorithm design. At the same time, some guarantees (even if probabilistic) are essential. We will then see that some very simple empirical algorithms actually are remarkably effective in practice. And yet we can provide some guarantees as well. Specifically, we will view each iteration of such algorithms as iteration of a random operator which is not necessarily a contraction. Yet, we can argue convergence to a probabilistic fixed point of the random operator. This is accomplished via an easy-to-operationalize but quite powerful technique of stochastic dominance via a Markov chain. Empirical algorithms for finite state and action space, Q-value functions, asynchronous/online settings will be shown. Finally, we will demonstrate how we can combine randomized Kernel-based function fitting with empirical algorithms to yield `universal' reinforcement learning algorithms for continuous MDPs with provable probabilistic performance guarantees.
Rahul Jain is the K. C. Dahlberg Early Career Chair and Associate Professor of Electrical Engineering and Computer Science at the University of Southern California (USC), Los Angeles, CA and a courtesy appointment in the ISE department also. He received a B.Tech from the IIT Kanpur, and an MA in Statistics and a PhD in EECS from the University of California, Berkeley. Prior to joining USC, he was at the IBM T J Watson Research Center, Yorktown Heights, NY. He has received numerous awards including the NSF CAREER award, the ONR Young Investigator award, an IBM Faculty award, and James H. Zumberge Faculty Research and Innovation Award, and is currently a US Fulbright Specialist Scholar. His interests span reinforcement learning, stochastic control, statistical learning, stochastic networks, game theory and power system economics.
Anyone who would like to give one of the weekly seminars on the RTDM program can fill in the survey at https://goo.gl/forms/Li5jQ0jm01DeYZVC3.