Stochastic gradient descent (SGD) methods are very popular for large scale optimization and learning because they are scalable and have good theoretical guarantees. In particular, they are robust to noise, which is particularly appealing in the context of privacy because it leads to a natural optimization method using noisy gradients. However, from a practical perspective using SGD is often an art, since the user must carefully pick parameters to yield good performance on real data. In this talk I will discuss how minibatching gives practical improvements for differentially private SGD and how that interacts with the step size of the algorithm.

Joint Work with Shuang Song and Kamalika Chaudhuri.

Video Recording