Playlist: 25 videos

### Algorithmic Aspects of Causal Inference

0:36:45

Guy Van den Broeck (UCLA)

https://simons.berkeley.edu/talks/tractable-probabilistic-circuits

Algorithmic Aspects of Causal Inference

This talk will give an overview of recent developments in (non-causal) probabilistic models that are tractable for computing properties such as marginal probabilities, entropies, expectations, and related queries of interest. These tractable probabilistic circuit models are now also effectively learned from data and can provide an efficient probabilistic foundation for causal inference algorithms.

Visit talk page
https://simons.berkeley.edu/talks/tractable-probabilistic-circuits

Algorithmic Aspects of Causal Inference

This talk will give an overview of recent developments in (non-causal) probabilistic models that are tractable for computing properties such as marginal probabilities, entropies, expectations, and related queries of interest. These tractable probabilistic circuit models are now also effectively learned from data and can provide an efficient probabilistic foundation for causal inference algorithms.

0:40:6

Aditi Raghunathan (UC Berkeley)

https://simons.berkeley.edu/talks/neural-networks-and-spurious-correlations

Algorithmic Aspects of Causal Inference

Neural networks trained on large datasets have powered several notable successes across various domains. However, such models often fail in the presence of distribution shifts because they can pick up on spurious correlations present in the dataset. In this talk, I will discuss how to train neural networks that are robust to known spurious correlations. We will see that this leads to counter-intuitive observations about the effect of model size and training data. Finally, I will discuss some approaches to mitigating the effect of spurious correlations in the absence of complete information about the spurious correlations.

Visit talk page
https://simons.berkeley.edu/talks/neural-networks-and-spurious-correlations

Algorithmic Aspects of Causal Inference

Neural networks trained on large datasets have powered several notable successes across various domains. However, such models often fail in the presence of distribution shifts because they can pick up on spurious correlations present in the dataset. In this talk, I will discuss how to train neural networks that are robust to known spurious correlations. We will see that this leads to counter-intuitive observations about the effect of model size and training data. Finally, I will discuss some approaches to mitigating the effect of spurious correlations in the absence of complete information about the spurious correlations.

0:44:10

Ankur Moitra (MIT)

https://simons.berkeley.edu/talks/tbd-394

Algorithmic Aspects of Causal Inference

Graphical models are a rich language for describing high-dimensional distributions in terms of their dependence structure. While there are algorithms with provable guarantees for learning undirected graphical models in a variety of settings, there has been much less progress in the important scenario when there are latent variables, which is the focus of our work.

In particular, we study Restricted Boltzmann Machines (RBMs) which are a popular model with wide-ranging applications. We gave a simple greedy algorithm for learning ferromagnetic RBMs. Our analysis is based on tools from mathematical physics that were developed to show the concavity of magnetization. Conversely we show that even for a constant number of latent variables with constant degree, without ferromagneticity the problem is as hard as sparse parity with noise. This hardness result is based on a sharp and surprising characterization of the representational power of bounded degree RBMs.

Based on joint work with Guy Bresler and Frederic Koehler.

Visit talk page
https://simons.berkeley.edu/talks/tbd-394

Algorithmic Aspects of Causal Inference

Graphical models are a rich language for describing high-dimensional distributions in terms of their dependence structure. While there are algorithms with provable guarantees for learning undirected graphical models in a variety of settings, there has been much less progress in the important scenario when there are latent variables, which is the focus of our work.

In particular, we study Restricted Boltzmann Machines (RBMs) which are a popular model with wide-ranging applications. We gave a simple greedy algorithm for learning ferromagnetic RBMs. Our analysis is based on tools from mathematical physics that were developed to show the concavity of magnetization. Conversely we show that even for a constant number of latent variables with constant degree, without ferromagneticity the problem is as hard as sparse parity with noise. This hardness result is based on a sharp and surprising characterization of the representational power of bounded degree RBMs.

Based on joint work with Guy Bresler and Frederic Koehler.

0:32:21

Raghu Meka (UCLA)

https://simons.berkeley.edu/talks/tbd-395

Algorithmic Aspects of Causal Inference

Gaussian Graphical models have wide-ranging applications in machine learning and the natural and social sciences where they are one of the most popular ways to model statistical relationships between observed variables. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and the goal is to learn the model assuming the underlying model is sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned.

I will talk about the first fixed polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains.

Joint work with Jonathan Kelner, Frederick Koehler, Ankur Moitra.

Visit talk page
https://simons.berkeley.edu/talks/tbd-395

Algorithmic Aspects of Causal Inference

Gaussian Graphical models have wide-ranging applications in machine learning and the natural and social sciences where they are one of the most popular ways to model statistical relationships between observed variables. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and the goal is to learn the model assuming the underlying model is sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned.

I will talk about the first fixed polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains.

Joint work with Jonathan Kelner, Frederick Koehler, Ankur Moitra.

0:30:6

Frederic Koehler (Stanford)

https://simons.berkeley.edu/talks/preconditioning-sparse-linear-regression-using-graphical-structure

Algorithmic Aspects of Causal Inference

It is well-known that standard methods for sparse linear regression such as the LASSO generally perform worse as input features in the data become more strongly correlated, even though other (computationally inefficient!) estimators do not. For data generated by simple linear structural equation models this can lead the LASSO to have suboptimal performance. We show that under a low-treewidth condition on the graphical structure of the covariates, there exists a preconditioner which can 'fix' the performance of the LASSO to be nearly statistically optimal; the condition is also shown to be necessary in a certain sense. Based on a joint work with Jon Kelner, Raghu Meka, Dhruv Rohatgi.

Visit talk page
https://simons.berkeley.edu/talks/preconditioning-sparse-linear-regression-using-graphical-structure

Algorithmic Aspects of Causal Inference

It is well-known that standard methods for sparse linear regression such as the LASSO generally perform worse as input features in the data become more strongly correlated, even though other (computationally inefficient!) estimators do not. For data generated by simple linear structural equation models this can lead the LASSO to have suboptimal performance. We show that under a low-treewidth condition on the graphical structure of the covariates, there exists a preconditioner which can 'fix' the performance of the LASSO to be nearly statistically optimal; the condition is also shown to be necessary in a certain sense. Based on a joint work with Jon Kelner, Raghu Meka, Dhruv Rohatgi.

0:28:50

Anish Agarwal (UC Berkeley)

https://simons.berkeley.edu/talks/causal-matrix-completion

Algorithmic Aspects of Causal Inference

Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are “missing completely at random” (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of “latent confounders”, i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. For example, in the context of movie recommender systems—a canonical application for matrix completion—a user who vehemently dislikes horror films is unlikely to ever watch horror films. In general, these confounders yield “missing not at random” (MNAR) data, which can severely impact any inference procedure that does not correct for this bias. We develop a formal causal model for matrix completion through the language of potential outcomes, and provide novel identification arguments for a variety of causal estimands of interest. We design a procedure, which we call “synthetic nearest neighbors” (SNN), to estimate these causal estimands. We prove finite-sample consistency and asymptotic normality of our estimator. Our analysis also leads to new theoretical results for the matrix completion literature. In particular, we establish entry-wise, i.e., max-norm, finite-sample consistency and asymptotic normality results for matrix completion with MNAR data. As a special case, this also provides entry-wise bounds for matrix completion with MCAR data. Across simulated and real data, we demonstrate the efficacy of our proposed estimator

Visit talk page
https://simons.berkeley.edu/talks/causal-matrix-completion

Algorithmic Aspects of Causal Inference

Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are “missing completely at random” (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of “latent confounders”, i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. For example, in the context of movie recommender systems—a canonical application for matrix completion—a user who vehemently dislikes horror films is unlikely to ever watch horror films. In general, these confounders yield “missing not at random” (MNAR) data, which can severely impact any inference procedure that does not correct for this bias. We develop a formal causal model for matrix completion through the language of potential outcomes, and provide novel identification arguments for a variety of causal estimands of interest. We design a procedure, which we call “synthetic nearest neighbors” (SNN), to estimate these causal estimands. We prove finite-sample consistency and asymptotic normality of our estimator. Our analysis also leads to new theoretical results for the matrix completion literature. In particular, we establish entry-wise, i.e., max-norm, finite-sample consistency and asymptotic normality results for matrix completion with MNAR data. As a special case, this also provides entry-wise bounds for matrix completion with MCAR data. Across simulated and real data, we demonstrate the efficacy of our proposed estimator

0:35:1

Kavita Ramanan (Brown University)

https://simons.berkeley.edu/talks/parameter-estimation-undirected-graphical-models-hard-constraints

Algorithmic Aspects of Causal Inference

Parameterized families of finite-state Markov random fields (equivalently, undirected graphical models) with hard constraints (in which certain configurations are forbidden) arise in a variety of applications including communication networks and statistical physics. The standard maximum likelihood method for estimating parameters from a single sample is computationally intractable for such models. We describe an alternative algorithm that is amenable to computation tractable and also establish its asymptotic consistency. The proofs entail applications of the method of exchangeable pairs as well as combinatorial arguments.

Visit talk page
https://simons.berkeley.edu/talks/parameter-estimation-undirected-graphical-models-hard-constraints

Algorithmic Aspects of Causal Inference

Parameterized families of finite-state Markov random fields (equivalently, undirected graphical models) with hard constraints (in which certain configurations are forbidden) arise in a variety of applications including communication networks and statistical physics. The standard maximum likelihood method for estimating parameters from a single sample is computationally intractable for such models. We describe an alternative algorithm that is amenable to computation tractable and also establish its asymptotic consistency. The proofs entail applications of the method of exchangeable pairs as well as combinatorial arguments.

0:42:55

Arnab Bhattacharyya (National University of Singapore)

https://simons.berkeley.edu/talks/efficient-distance-estimation-and-causal-inference-discrete-models

Algorithmic Aspects of Causal Inference

I will discuss two basic algorithmic problems in statistical and causal inference. The first problem is to approximate the distance between two Bayesian networks. The second problem is to approximately learn an interventional distribution from observational samples. For both problems, assuming that all variables are over a finite domain, I will describe efficient algorithms with finite sample guarantees as well as computational hardness for various parameter regimes. Joint work with Pavan Aduri, Sutanu Gayen, Kuldeep Meel, Dimitrios Myrisiotis, and NV Vinodchandran.

Visit talk page
https://simons.berkeley.edu/talks/efficient-distance-estimation-and-causal-inference-discrete-models

Algorithmic Aspects of Causal Inference

I will discuss two basic algorithmic problems in statistical and causal inference. The first problem is to approximate the distance between two Bayesian networks. The second problem is to approximately learn an interventional distribution from observational samples. For both problems, assuming that all variables are over a finite domain, I will describe efficient algorithms with finite sample guarantees as well as computational hardness for various parameter regimes. Joint work with Pavan Aduri, Sutanu Gayen, Kuldeep Meel, Dimitrios Myrisiotis, and NV Vinodchandran.

0:40:26

Leonard J. Schulman (Caltech)

https://simons.berkeley.edu/talks/identifying-mixture-models

Algorithmic Aspects of Causal Inference

We give an algorithm for source identification of a mixture of k product distributions on n bits. This problem (which we call k-MixProd) is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments; the complexity is ~ exp(k log k) arithmetic operations and similar sample size. Our analysis gives a quantitative version of a qualitative characterization of identifiable sources, that is due to Tahmasebi, Motahari, and Maddah-Ali (2018). Key to our work are a method of ``synthetic bits'' and a condition number lower bound for Hadamard Extensions. Based on joint works with Chaitanya Swamy (Waterloo), Yuval Rabani (Hebrew U), Jian Li (Tsinghua), Spencer Gordon (Caltech) and Bijan Mazaheri (Caltech)

Visit talk page
https://simons.berkeley.edu/talks/identifying-mixture-models

Algorithmic Aspects of Causal Inference

We give an algorithm for source identification of a mixture of k product distributions on n bits. This problem (which we call k-MixProd) is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments; the complexity is ~ exp(k log k) arithmetic operations and similar sample size. Our analysis gives a quantitative version of a qualitative characterization of identifiable sources, that is due to Tahmasebi, Motahari, and Maddah-Ali (2018). Key to our work are a method of ``synthetic bits'' and a condition number lower bound for Hadamard Extensions. Based on joint works with Chaitanya Swamy (Waterloo), Yuval Rabani (Hebrew U), Jian Li (Tsinghua), Spencer Gordon (Caltech) and Bijan Mazaheri (Caltech)

0:42:31

Yuval Rabani (The Hebrew University of Jerusalem)

https://simons.berkeley.edu/talks/identifying-mixtures-bayesian-network-distributions

Algorithmic Aspects of Causal Inference

Bayesian Network distributions are fundamental to research in causal inference. We consider finite mixtures of such models, which are projections on the variables of a Bayesian Network distribution on the larger graph which has an additional hidden random variable U, ranging in {1, 2, ..., k}, and a directed edge from U to every other vertex. Thus, the confounding variable U selects the mixture constituent that determines the joint distribution of the observable variables. We give the first algorithm for identifying Bayesian Network distributions that can handle the case of non-empty graphs. The complexity for a graph of maximum degree ∆ (ignoring the degree of U) is roughly exponential in the number of mixture constituents k, and the degree ∆ squared (suppressing dependence on secondary parameters).

Visit talk page
https://simons.berkeley.edu/talks/identifying-mixtures-bayesian-network-distributions

Algorithmic Aspects of Causal Inference

Bayesian Network distributions are fundamental to research in causal inference. We consider finite mixtures of such models, which are projections on the variables of a Bayesian Network distribution on the larger graph which has an additional hidden random variable U, ranging in {1, 2, ..., k}, and a directed edge from U to every other vertex. Thus, the confounding variable U selects the mixture constituent that determines the joint distribution of the observable variables. We give the first algorithm for identifying Bayesian Network distributions that can handle the case of non-empty graphs. The complexity for a graph of maximum degree ∆ (ignoring the degree of U) is roughly exponential in the number of mixture constituents k, and the degree ∆ squared (suppressing dependence on secondary parameters).