Abstract
Understanding real-world usage is critical for improving Generative AI, but analyzing this data risks exposing sensitive user inputs. While platforms use "privacy-aware" heuristics like PII redaction and clustering to mitigate this, are these protections actually secure? First, we put these claims to the test by introducing CLIOPATRA, the first successful privacy attack against Anthropic's CLIO. We demonstrate how an adversary can insert malicious chats to systematically bypass layered protections and leak sensitive data. Evaluated against synthetic medical chats, CLIOPATRA proves that knowing just basic demographics and a single symptom allows an attacker to extract a target’s full medical history up to 100% of the time—showing that ad-hoc, heuristic mitigations are fundamentally unreliable. If heuristics fail, how can developers safely extract insights? To answer this, we introduce Provably Private Insights (PPI), a novel framework that abandons heuristics in favor of mathematically guaranteed privacy. PPI bridges the gap between raw data and analytics by integrating Trusted Execution Environments (TEEs) for external verifiability, "Data Expert" LLMs operating within secure enclaves, and Differential Privacy (DP) for anonymous aggregation. By walking through PPI’s open-source architecture and its real-world deployment in Google's Android Recorder app, this talk demonstrates the practicality of provably private AI analytics at scale.