Meta Learning, Black Box Optimization, and OpenAI

The first part is on RL^2:  fast reinforcement learning via slow reinforcement learning.  The idea is to use a slow reinforcement learning algorithm to train a recurrent neural network policy that would, in effect, act as a fast reinforcement learning algorithm.  We demonstrate the validity of this method by showing that it can learn to solve bandit problems and small tabular MDPs, and show that the method scales up reasonably well with today's RL techniques to maze navigation in 3d environments.

The second part is about Evolution Strategies, a derivative-free optimization method that has been rediscovered and explored by many research communities. We show that a) it competitive with today's RL algorithms on standard RL benchmarks, and b) it scales *extremely* well with number of workers.   This result is unexpected, because our models have hundreds of thousands of dimensions, and it wasn't obvious that Evolution Strategies would be successful on problems with so many dimensions.

I'll finish the talk with a brief overview of a few other research projects that are taking place in OpenAI.

All scheduled dates:


No Upcoming activities yet