![Summer Cluster in Deep Learning Theory_hi-res logo](/sites/default/files/styles/workshop_banner_sm_1x/public/2023-03/Deep%20Learning%20Theory_hi-res.jpg?h=25200f4f&itok=BZ0-S5xc)
Abstract
Abstract: Modern multi-modal models such as CLIP require significant engineering efforts to efficiently train, evaluate, and deploy. Furthermore, such models typically serve as a backbone feature extractor for many downstream tasks. This talk will provide an overview of how we’ve accomplished this at Apple, where CLIP now powers a large number of user experiences on iOS. We’ll cover concepts such as multi-node multi-gpu distributed training on billions of examples, transfer learning for downstream tasks, model pruning, efficient on-device inference with transformers, and more.