Abstract

As LLMs evolve into persistent personal agents—managing calendars, emails, and health records across sessions—they accumulate rich user memories that enable powerful personalization but create new privacy risks. The recent explosion of tools like OpenClaw, where tens of thousands of always-on AI agents were deployed with full access to users' messages, credentials, and conversation histories, makes these risks concrete and urgent. What should an agent remember, and who should it tell? In this talk, we explore both sides of this question. First, through CIMemories [ICLR 2026], we introduce a compositional benchmark for evaluating whether LLMs respect contextual integrity when drawing on persistent memory. Our evaluation reveals that frontier models exhibit up to 69% attribute-level violations, leaking sensitive information in inappropriate contexts, and that these violations accumulate unpredictably across tasks and runs—exposing fundamental instability in how models reason about context-dependent disclosure. We then ask: can we architect systems that avoid this trade-off entirely? In PPMI, we present a hybrid framework that decomposes tasks between a powerful but untrusted remote LLM and a trusted local model, using Socratic chain-of-thought reasoning and homomorphically encrypted vector search over private data. Our approach, pairing GPT-4o with a local Llama-3.2-1B, outperforms GPT-4o alone on long-context QA—demonstrating that privacy and utility need not be at odds. We conclude by arguing that these failures are not bugs that scale will fix: they reflect a missing notion of contextual norms in model training and architecture. As agents gain persistent memory and autonomy, the line between personalization and surveillance thins—making principled privacy reasoning not just a feature, but a prerequisite for trustworthy AI.

Video Recording