Abstract

Correlation mining is an emerging area of data mining where the objective is to learn patterns of correlation between a large number of observed variables based on a limited number of samples. Correlation mining can be framed as the mathematical problem of reliably reconstructing different attributes of the correlation matrix, or its inverse, from the sample covariance matrix empirically constructed from the data. Reconstructing some attributes requires relatively few samples, e.g., screening for the presence of variables that are hubs of high correlation in a sparsely correlated population, while others require many more samples, e.g., accurately estimating all entries of the inverse covariance matrix in a densely correlated population. We will discuss correlation mining in the context of sampling requirements and as a function of the inference task. This is joint work with Bala Rajaratnam at Stanford University.

Video Recording