Results 241 - 250 of 23739
To improve trust and adoption of collaborative learning systems, it is important to provide opt-out guarantees which enable removal of unwanted (e.g. harmful or private) information after it has been used to train a model. A popular solution to this problem is approximate unlearning, which efficiently updates an LLM so that it behaves (roughly) as if it was not trained on a subset of data to begin with. However, existing methods are brittle in practice and can easily be attacked to reveal supposedly unlearned information. To alleviate issues with approximate unlearning, we instead propose SIFT-Masks (SIgn-Fixed Tuning-Masks), an exact unlearning method based on model merging. SIFT-Masks addresses two key limitations of standard model merging: (1) merging a large number of tasks can severely harm utility; and (2) methods that boost utility by sharing extra information across tasks make exact unlearning prohibitively expensive. SIFT-Masks solves these issues by (1) applying local masks to recover task-specific performance; and (2) constraining finetuning to align with a global sign vector as a lightweight approach to determine masks independently before merging. Across four settings where we merge up to 500 models, SIFT-Masks improves accuracy by 5-80% over naive merging and uses up to 250x less compute for exact unlearning compared to other merging baselines.
Even in decentralized learning, we show that data can be leaked through the training of machine learning models, motivating the need for protection mechanisms. While differential privacy is the gold standard for centralized privacy-preserving machine learning, it is not well suited to decentralized learning, where participants collaboratively train a model via peer-to-peer messages. We present a relaxation of differential privacy that captures a relevant trust setting for decentralized learning. We show that decentralization can lead to a better privacy-utility tradeoff in this regime.
We start with a brief overview of the current approach of federated learning (FL) and differential privacy (DP) for private training of small LMs, empowering production on-device decoding LMs in mobile keyboard applications. Recent technical advances including new BLT-DP-FTRL algorithm offers strong privacy-utility trade-offs and ease-of-use in deployment, the SI-CIFG model architecture for efficient on-device training and compatibility with DP, and synthetic data from LLMs to improve (public) pre-training. The dedication to privacy-preserving learning to improve small LMs has not only delivered substantial user benefits, but has also helped improve LLMs in mobile typing applications, bridged by synthetic data. Next, we share our exploration over the past few years on generating and using synthetic data to improve LMs. We focus on approaches adhering to the privacy principles of both data minimization and data anonymization, and show how they are making a real-world impact in small and large models. We discuss synthetic data usage in complicated training paradigms and algorithms to improve quality and efficiency of synthetic data generation. Finally, we discuss implications of recent trends in federated learning, open problems and a preliminary study on personalization.
Differentially private training methods typically rely on injecting external noise at each iteration, as in DP-SGD, to limit the influence of individual data points. In this talk, we will explore how inherent algorithmic randomness already embedded in modern AI training pipelines for non-privacy reasons can be harnessed for privacy amplification, thereby reducing reliance on externally injected noise.
Prior work has studied privacy amplification through user or data subsampling, but largely under idealized assumptions such as independent Poisson subsampling. In practice, training pipelines exhibit more structured, system-driven forms of randomness. The goal of this talk is twofold: first, to move beyond idealized subsampling models toward structured sampling mechanisms that better reflect real-world constraints; and second, to investigate additional sources of algorithmic randomness, including model partitioning, dropout, and compression, that naturally limit how much information any single sample or user contributes to the final model. We will discuss how these mechanisms can be rigorously quantified to strengthen privacy guarantees at scale.
Decoupled training methods, such as DiLoCo, relax the requirement for frequent synchronization, allowing large-scale models to be trained across distributed, high-latency compute clusters. This talk examines recent progress in making these methods work at scale and the ongoing development of "optimization for decoupled training." We then explore how the empirical success of decoupled training has unexpectedly influenced "standard" optimization, leading to new lookahead-inspired Nesterov-style methods. Finally, we highlight open questions at the intersection of decoupled training and federated learning. Throughout, we discuss key similarities and distinctions between these fields and ruminate on how heterogeneity, typically seen as a hurdle in federated settings, might actually serve as a powerful tool for improving decoupled training when injected intentionally.