The Power of Resets! Learning better, one reset at a time

Workshop

The Future of Language Models and Transformers

Speaker(s)

Kianté Brantley (Harvard University)

Location

Calvin Lab Auditorium

Date

Friday, Apr. 4, 2025

Time

11 a.m. – 12 p.m. PT

Abstract

ost-training is essential for enhancing large language model (LLMs) capabilities and aligning them to human preferences. One of the most
widely used post-training techniques is reinforcement learning from human feedback (RLHF). In this talk, I will first discuss the challenges of applying RL to LLM training. Next, I will introduce RL algorithms that tackle these challenges by utilizing key properties of the underlying problem. Additionally, I will present an approach that
simplifies the RL policy optimization process for LLMs to relative reward regression.

Attachment

Slides

The Power of Resets! Learning better, one reset at a time

Abstract

Attachment

Video Recording