Abstract

Deployment of Neural network models for real world safety critical applications necessitates going beyond empirical validation of the accuracy on test data. Adversarial examples for image classifiers are evidence for the need for a more formal verification paradigm. While progress is starting to be made on this direction, most existing approaches are applicable to networks with piece-wise linear non-linearities (ReLUs and max-pooling, for example). This paper overcomes this by developing a general framework for model verification that applies to feed-forward networks with arbitrary component-wise non-linearities (including tanh, sigmoid, ELUs) and max-pooling. Our method relies on formulating the verification problem as an optimization problem and solving a Lagrangian relaxation of the same to obtain a bound on the verification objective. We demonstrate the practical efficiency of our approach on several tasks: Verifying adversarial robustness of image classifiers and verifying stability of neural network predictions to features evolving over time.

Video Recording