Jean Kaddour*, Aengus Lynch*, Qi Liu, Matt J. Kusner, Ricardo Silva

* equal contribution

Abstract

Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems they address: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, and (5) causal reinforcement learning. We systematically compare the methods in each category and point out open problems. Further, we review data-modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work.

Causality: A Minimal Introduction

We give a minimal introduction to key concepts in causality that is completely self-contained.

Causal Supervised Learning

improves predictive generalization by learning invariant features or mechanisms, aiming at deconfounding models’ reliance on spurious associations.

Causal Generative Modeling

supports sampling from interventional or counterfactual distributions, naturally performing controllable generation or sample editing tasks, respectively.

Causal Explanations

clarify model predictions while accounting for the causal structure of either (i) the model mechanics or (ii) the underlying data.

Causal Fairness

mitigates harmful disparities w.r.t. causal relationships of the underlying data, such as demographic biases.

Causal Reinforcement Learning

uses the causal structure of the environment for decision-making. Potential benefits include sample efficiency, accounting for unobserved confounding in partially observable state spaces, and analyzing agent incentives.

Questions or Feedback?

jean dot kaddour dot 20 at ucl dot ac dot uk

aengus dot lynch dot 17 at ucl dot ac dot uk

Citation

@article{kaddour2022causal,

title={Causal Machine Learning: A Survey and Open Problems},

author={Jean Kaddour and Aengus Lynch and Qi Liu and Matt J. Kusner and Ricardo Silva},

year={2022},

url = {https://arxiv.org/abs/2206.15475},

journal={arXiv preprint arXiv:2206.15475},

}