Learning Latent Events from Network Message Logs

Abstract

We consider the problem of separating error messages generated in large distributed data center networks into error events. In such networks, each error event leads to a stream of messages generated by all network components affected by the event. These messages are stored in a giant message log, with no information about the associated events. We study the unsupervised learning problem of identifying the signatures of the events that generated these messages; here, the signature of an error event refers to the probability distribution of messages generated by the event. We design a low-complexity algorithm for this purpose, and demonstrate its scalability on a real dataset consisting of 97 million messages collected over a period of 15 days, from a distributed data center network which supports the operations of a large wireless service provider.

Attachment

Learning Latent Events from Network Message Logs

Learning Latent Events from Network Message Logs

Abstract

Attachment

Video Recording