Neural networks trained on large datasets have powered several notable successes across various domains. However, such models often fail in the presence of distribution shifts because they can pick up on spurious correlations present in the dataset. In this talk, I will discuss how to train neural networks that are robust to known spurious correlations. We will see that this leads to counter-intuitive observations about the effect of model size and training data. Finally, I will discuss some approaches to mitigating the effect of spurious correlations in the absence of complete information about the spurious correlations.