How to combine the complementary strengths of probabilistic graphical models and neural networks? We compose latent graphical models with neural network observation likelihoods. For inference, we use recognition networks to produce local evidence potentials, then combine them using efficient message-passing algorithms. All components are trained simultaneously with a single stochastic variational inference objective. We use this framework to automatically segment and categorize mouse behavior from raw depth video.