Skip to content

Commit

Permalink
add comparison to existing packages
Browse files Browse the repository at this point in the history
  • Loading branch information
salbalkus committed Jan 7, 2025
1 parent ecb3ca5 commit 9f484c7
Show file tree
Hide file tree
Showing 3 changed files with 103 additions and 42 deletions.
41 changes: 41 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,44 @@ @misc{Schauer2024
copyright = {Creative Commons Attribution 4.0 International}
}

@manual{squires2018causaldag,
title={{\texttt{causaldag}: creation, manipulation, and learning of causal models}},
author={{Chandler Squires}},
year={2018},
url={https://github.com/uhlerlab/causaldag},
}

@article{Textor2017,
title={Robust causal inference using directed acyclic graphs: the R package ‘dagitty’},
ISSN={1464-3685}, url={http://dx.doi.org/10.1093/ije/dyw341},
DOI={10.1093/ije/dyw341}, journal={International Journal of Epidemiology},
publisher={Oxford University Press (OUP)},
author={Textor, Johannes and van der Zander, Benito and Gilthorpe, Mark S. and Liśkiewicz, Maciej and Ellison, George T.H.},
year={2017},
month=jan,
pages={dyw341} }

@book{tlverse,
title={Targeted Learning in R: Causal Data Science with the tlverse Software Ecosystem},
author={Mark {van der Laan} and Jeremy Coyle and Nima Hejazi and Ivana Malenica and Rachael Phillips and Alan Hubbard},
year={2009},
publisher={Cambridge university press},
url={https://tlverse.org/tlverse-handbook/}
}

@article{Chen2020,
title = {{CausalML}: Python Package for Causal Machine Learning},
author = {Chen, Huigang and Harinen, Totte and Lee, Jeong-Yoon and
Yung, Mike and Zhao, Zhenyu},
year = {2020},
journal = {arXiv preprint arXiv:2002.11631}
}

@article{dowhy,
title={DoWhy: An End-to-End Library for Causal Inference},
author={Sharma, Amit and Kiciman, Emre},
journal={arXiv preprint arXiv:2011.04216},
year={2020}
}


98 changes: 57 additions & 41 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,18 @@ toc-title: Table of contents
Estimating the strength of causal relationships between variables is an
important problem across many scientific disciplines. `CausalTables.jl`
supports the development of new statistical methods for causal inference
in Julia by providing tools to (1) easily store and process data endowed
with causal structure and (2) simulate data from causal models for
experimental testing. Firstly, the package implements a `CausalTable`
structure that stores features annotated with causal labels in a
`Tables.jl`-compatible format. Its interface includes causal-relevant
functions, such as extracting relevant variables and applying
interventions on treatment. Secondly, `CausalTables.jl` introduces a
`StructuralCausalModel` for randomly generating data from user-specified
causal models and computing ground truth parameters under the given
experiment. Together, these functionalities expand the Julia ecosystem
by supporting the development and experimental assessment of the growing
number of causal inference methods.
in Julia by providing tools to (1) easily store and process tabular data
endowed with causal structure and (2) simulate data from causal models
for experimental testing. Firstly, the package implements a
`CausalTable` structure that stores features annotated with causal
labels in a `Tables.jl`-compatible format. Its interface includes
causal-relevant functions, such as extracting relevant variables and
applying interventions on treatment. Secondly, `CausalTables.jl`
introduces a `StructuralCausalModel` for randomly generating data from
user-specified causal models and computing ground truth parameters under
the given experiment. Together, these functionalities expand the Julia
ecosystem by supporting the development and experimental assessment of
the growing number of causal inference methods.

# Statement of need

Expand All @@ -49,17 +49,12 @@ relationships between variables from observed data
[@pearl2009causality; @hernan2020causal]; causal inference techniques
have helped applied scientists and decision-makers better understand
important phenomena in fields ranging from health and medicine to
politics and economics. As interest in causal inference continues to
grow across many disciplines, so too does the development of software
tools for estimating causal effects. While Julia packages for causal
inference have begun to emerge---with examples including `TMLE.jl`
[@TMLE.jl] and `CausalELM.jl`[@CausalELM.jl] for estimation and
`CausalInference.jl` [@Schauer2024] for causal discovery---the ecosystem
is still in its infancy. New methods for causal inference are being
developed at a rapid pace, underscoring the need for tools designed to
support their development. `CausalTables.jl` aims to provide such a tool
for the Julia language. Currently, attempts to implement and test causal
inference methods in Julia face two major challenges.
politics and economics. New methods for causal inference are being
developed at a rapid pace, but there currently do not exist auxiliary
tools designed to support their development in the Julia language.
`CausalTables.jl` aims to provide such a tool. Presently, attempts to
implement and test causal inference methods in Julia face two major
challenges.

First, causal inference requires data to be preprocessed in various ways
based on the underlying causal structure. Suppose one were to write
Expand Down Expand Up @@ -92,16 +87,37 @@ of several common causal effect parameters.

By addressing these two major challenges---preprocessing and
simulation--- `CausalTables.jl` simplifies and accelerates the
development of tools for statistical causal inference in Julia. The
`CausalTable` interface extends `Tables.jl`, the most common interface
for accessing tabular data in Julia [@quinn2024tables]. The SCM
framework operates in conjunction with `Distributions.jl`, the primary
Julia package for working with random variables
development of tools for statistical causal inference on tabular data in
Julia. The `CausalTable` interface extends `Tables.jl`, the most common
interface for accessing tabular data in Julia [@quinn2024tables]. The
SCM framework operates in conjunction with `Distributions.jl`, the
primary Julia package for working with random variables
[@JSSv098i16; @Distributions.jl-2019]. By integrating seamlessly with
other commonly used packages in the Julia ecosystem, `CausalTables.jl`
ensures both compatibility and ease of use for statisticians and applied
scientists alike.

# Comparison to existing packages

As interest in causal inference continues to grow across disciplines, so
too has the development of software tools for estimating causal effects.
While the a multitude of methods have been implemented in the R and
Python languages (for instance, [@tlverse] or [@Chen2020]), Julia has
seen relatively fewer. Recent Julia packages for causal inference
include `TMLE.jl` [@TMLE.jl] and `CausalELM.jl`[@CausalELM.jl]. These
packages focus on estimation techniques using tabular data: they
implement specific ways to label causal structure, but do not provide a
general simulation or causal-specific data processing interface like
`CausalTables.jl`. On the other hand, `CausalInference.jl`
[@Schauer2024] provides an interface for representing causal graphs and
implements causal discovery algorithms, similar to CausalDAG
[@squires2018causaldag] or DoWhy [@dowhy] in Python and daggity
[@Textor2017] in R. However, it is generally incompatible with the
tabular data format required by statistical tools, and also cannot
simulate data. In fact, as far as we are aware, `CausalTables.jl` is the
first package for simulating and extracting ground-truth causal
estimands from an existing SCM in Julia.

# Instructional use cases

A standard causal inference problem is to estimate the effect of one
Expand Down Expand Up @@ -338,19 +354,19 @@ mean(conmean(scm, ct_intervened, :Y) .- responsematrix(ct))
# Closing remarks

`CausalTables.jl` provides useful auxiliary functions to support causal
inference methods in Julia. The package focuses on tools relevant to
estimating the effect of one or more treatment variables on a response.
The `StructuralCausalModel` allows users to easily extract ground truth
values for any relevant aspect of a data-generating process, supporting
the benchmarking of many common causal inference methods. While the
package includes high-level functions to approximate several prominent
estimands, users can also write their own interventions and use
low-level functions such as `intervene`, `draw_counterfactual`, and
`condensity` to approximate the ground truth of novel estimands. By
combining this with the power of the `CausalTable` interface for
processing data once it is generated, `CausalTables.jl` serves as a
useful tool for scientists seeking to develop and experimentally
evaluate new causal inference methods.
inference methods on tabular data in Julia. The package focuses on tools
relevant to estimating the effect of one or more treatment variables on
a response. The `StructuralCausalModel` allows users to easily extract
ground truth values for any relevant aspect of a data-generating
process, supporting the benchmarking of many common causal inference
methods. While the package includes high-level functions to approximate
several prominent estimands, users can also write their own
interventions and use low-level functions such as `intervene`,
`draw_counterfactual`, and `condensity` to approximate the ground truth
of novel estimands. By combining this with the power of the
`CausalTable` interface for processing data once it is generated,
`CausalTables.jl` serves as a useful tool for scientists seeking to
develop and experimentally evaluate new causal inference methods.

# Acknowledgements

Expand Down
6 changes: 5 additions & 1 deletion paper/paper.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,18 @@ Estimating the strength of causal relationships between variables is an importan

# Statement of need

The quantitative science of causal inference has emerged over the past three decades as a set of formalisms for studying cause-and-effect relationships between variables from observed data [@pearl2009causality; @hernan2020causal]; causal inference techniques have helped applied scientists and decision-makers better understand important phenomena in fields ranging from health and medicine to politics and economics. As interest in causal inference continues to grow across many disciplines, so too does the development of software tools for estimating causal effects. While Julia packages for causal inference have begun to emerge---with examples including `TMLE.jl` [@TMLE.jl] and `CausalELM.jl`[@CausalELM.jl] for estimation and `CausalInference.jl` [@Schauer2024] for causal discovery---the ecosystem is still in its infancy. New methods for causal inference are being developed at a rapid pace, but there currently do not exist auxiliary tools designed to support their development in the Julia language. `CausalTables.jl` aims to provide such a tool. Currently, attempts to implement and test causal inference methods in Julia face two major challenges.
The quantitative science of causal inference has emerged over the past three decades as a set of formalisms for studying cause-and-effect relationships between variables from observed data [@pearl2009causality; @hernan2020causal]; causal inference techniques have helped applied scientists and decision-makers better understand important phenomena in fields ranging from health and medicine to politics and economics. New methods for causal inference are being developed at a rapid pace, but there currently do not exist auxiliary tools designed to support their development in the Julia language. `CausalTables.jl` aims to provide such a tool. Presently, attempts to implement and test causal inference methods in Julia face two major challenges.

First, causal inference requires data to be preprocessed in various ways based on the underlying causal structure. Suppose one were to write their own method by building on existing statistical packages in Julia. Using `MLJ.jl` [@blaom2020mlj] would necessitate extracting the treatment and response as Vectors and the variables hypothesized to cause them as Tables; meanwhile; using `GLM.jl` [@bates2023glm] would require the same but as Matrix or `Formula` objects. This challenge has also led to differences in the API for existing causal methods: for instance, `CausalELM.jl` [@CausalELM.jl] requires the user to split apart treatment and response variables as individual vectors, while `TMLE.jl` require the entire dataset in a `Tables.jl`-compliant format with treatment and response variables being labeled via strings or symbols. `CausalTables.jl` provides a `CausalTable` interface that, by packaging the data and auxiliary causal knowledge together, allows extracting relevant causal components in multiple ways. This simplifies both writing new packages as well as processing data to evaluate existing packages.

Second, testing the performance of new estimators often requires simulating data for numerical experiments from a Structural Causal Model (SCM) [@pearl2009causality] so as to compare them to an underlying ground truth (encoded via interventions on the SCM). An SCM defines causal structure by envisaging a data-generating process as random draws from a sequence of non-parametric structural equations, with each draw depending on realizations from draws preceding it. `CausalTables.jl` provides a simple, user-friendly way to define an SCM, sample data randomly from it, and compute or approximate the underlying true values of several common causal effect parameters.

By addressing these two major challenges---preprocessing and simulation--- `CausalTables.jl` simplifies and accelerates the development of tools for statistical causal inference on tabular data in Julia. The `CausalTable` interface extends `Tables.jl`, the most common interface for accessing tabular data in Julia [@quinn2024tables]. The SCM framework operates in conjunction with `Distributions.jl`, the primary Julia package for working with random variables [@JSSv098i16; @Distributions.jl-2019]. By integrating seamlessly with other commonly used packages in the Julia ecosystem, `CausalTables.jl` ensures both compatibility and ease of use for statisticians and applied scientists alike.

# Comparison to existing packages

As interest in causal inference continues to grow across disciplines, so too has the development of software tools for estimating causal effects. While the a multitude of methods have been implemented in the R and Python languages (for instance, [@tlverse] or [@Chen2020]), Julia has seen relatively fewer. Recent Julia packages for causal inference include `TMLE.jl` [@TMLE.jl] and `CausalELM.jl`[@CausalELM.jl]. These packages focus on estimation techniques using tabular data: they implement specific ways to label causal structure, but do not provide a general simulation or causal-specific data processing interface like `CausalTables.jl`. On the other hand, `CausalInference.jl` [@Schauer2024] provides an interface for representing causal graphs and implements causal discovery algorithms, similar to CausalDAG [@squires2018causaldag] or DoWhy [@dowhy] in Python and daggity [@Textor2017] in R. However, it is generally incompatible with the tabular data format required by statistical tools, and also cannot simulate data. In fact, as far as we are aware, `CausalTables.jl` is the first package for simulating and extracting ground-truth causal estimands from an existing SCM in Julia.

# Instructional use cases

A standard causal inference problem is to estimate the effect of one treatment variable $A$ on a response variable $Y$ in the presence of confounders $W$. One can evaluate the performance of causal inference methods in two ways: either by imposing a causal structure on an existing dataset, or by drawing new data randomly from a programmatically-defined SCM.
Expand Down

0 comments on commit 9f484c7

Please sign in to comment.