Large Scale Private Learning on Data Streams, and the BLTs

Abstract

In recent years, Differentially Private Follow the Regularized Leader (DP-FTRL) style algorithms have become increasingly popular for training large models over data streams, e.g., in training DP models for Gboard next-word prediction (https://research.google/blog/federated-learning-with-formal-differentia… and https://arxiv.org/abs/2305.18465). At the heart of these algorithms is a correlated noise addition mechanism that factorizes a square lower triangular matrix of all ones (A=BC) and adds noise BZ (with Z ~ N(0,σ^2)^{training steps x dimensions}) to the prefix sum of the gradients observed during training. Prior work required generating and storing the noise matrix in advance, which is prohibitively expensive for large-scale model training.

In this talk, we introduce a new algorithmic construction called the Buffered Linear Toeplitz operator (BLT) that allows generating the noise required for DP in a streaming fashion while requiring a buffer size of approximately log^2(training steps) x dimensions. Furthermore, compared to the class of all Lower Triangular Toeplitz (LTT) factorizations (i.e., B and C being LTT), BLTs are only off by an additive constant in terms of utility under standard error metrics. The talk will draw ideas from rational function approximations and constant recurrence sequences for the construction of BLTs. The talk will also demonstrate the practicality of this work through simulation experiments.

The talk is primarily based on https://arxiv.org/pdf/2404.16706, and is a joint work with Krishnamurthy (Dj) Dvijotham, Brendan McMahan, Krishna Pillutla, and Thomas Steinke.

Large Scale Private Learning on Data Streams, and the BLTs

Abstract

Video Recording