Abstract
The classical Chow-Liu algorithm estimates a tree-structured graphical model of a distribution using the maximum spanning tree on the pairwise mutual information graph. We show that, if the true distribution P is tree-structured over a discrete domain Σ^n, then the Chow-Liu algorithm returns a distribution Q with D(P||Q) < ε after O~(Σ^3 n / ε) samples, which is nearly optimal in n and ε.
Our analysis of Chow-Liu is based on a new result in conditional independence testing: we prove that for three random variables X,Y,Z each over Σ, testing if I(X;Y∣Z) is 0 or ≥ε is possible with O˜(Σ^3/ε) samples using the empirical mutual information.