Towards Reliable Use of Large Language Models: Better Detection, Consistency, and Instruction-Tuning

Workshop

Large Language Models and Transformers

Speaker(s)

Christopher D. Manning (Stanford University)

Location

Calvin Lab Auditorium

Date

Monday, Aug. 14, 2023

Time

1:30 – 2:30 p.m. PT

Abstract

While large pre-trained language models (LLMs) have enabled impressive results on a wide variety of tasks, even the largest existing models will answer inconsistently or head off in weird directions. For companies to be able to gain the benefits of these models in production use, it is now necessary to build an extensive tool ecosystem around the LLM engine, just like cars have seat belts, dash warning lights, and anti-lock brakes. In this talk, I will show recent work considering three such tools. (1) ConCORD: a lightweight method for improving LLM consistency through the use of off-the shelf Natural Language Inference models. (2) DetectGPT, a method to better detect LLM-generated text by looking at model probability function curvature. (3) Direct Preference Optimization, a new way of learning to steer LLMs from human preference data without needing to learn a reward model. Joint work with Eric Mitchell, Chelsea Finn, and many other Stanford coauthors.

Towards Reliable Use of Large Language Models: Better Detection, Consistency, and Instruction-Tuning

Abstract

Video Recording