Defense against prompt injection attacks

Workshop

Alignment, Trust, Watermarking, and Copyright Issues in LLMs

Speaker(s)

David Wagner (UC Berkeley)

Location

Calvin Lab Auditorium

Date

Monday, Oct. 14, 2024

Time

2 – 2:45 p.m. PT

Abstract

Prompt injection attacks are a significant threat to the security of LLM-integrated applications. These attacks exploit the lack of a clear separation between instructions/prompts and user data. I will introduce the notion of structured queries, a general approach to tackle this problem by explicitly separating prompt and data and training LLMs to respect this separation. I will describe how to adjust standard instruction tuning to respect this separation, and show the resulting models provide significant improvements in robustness against prompt injection.

Defense against prompt injection attacks

Abstract

Video Recording