Structural Causal Models

To demonstrate the fuctionality in this package, setup of the 'marks' example from the ggm R package:

using StructuralCausalModels

ProjDir = @__DIR__
cd(ProjDir)

df = DataFrame!(CSV.File(scm_path("..", "data", "marks.csv")));

DAG - directed acyclic graphs

DAG() accepts either an OrderedDict, an adjacency_matrix or a ggm/dagitty string. See the ancestral_graph section below for an example using an adjacency_matrix.

Below d_string holds a ggm DAG definition.

d_string = "DAG(
    mechanics ~ vectors+algebra, 
    vectors ~ algebra, 
    statistics ~ algebra+analysis, 
    analysis ~ algebra)"

dag = DAG("marks", d_string; df=df);
show(dag)

The d_string can also contain a dagitty causal model.

# fig2.6.dag <- dagitty("dag { {X V} -> U; S1 <- U; {Y V} -> W; S2 <- W}")
dag = DAG("fig_2_6", "dag {{X V} -> U; S1 <- U; {Y V} -> W; S2 <- W}")

In the REPL show(dag) will display:

DAG object:

name = "marks"
vars = [:mechanics, :vectors, :algebra, :statistics, :analysis]

OrderedDict{Symbol,Union{Nothing, Array{Symbol,1}, Symbol}} with 4 entries:
  :mechanics  => [:vectors, :algebra]
  :vectors    => :algebra
  :statistics => [:algebra, :analysis]
  :analysis   => :algebra

Internally, a DAG object will always contain an OrderedDict representation of the DAG. This representation is used in all functions. In the definition of the OrderedDict, read => as ~ in regression models or <- in causal models.

Optional display the DAG using GraphViz:

fname = ProjDir * "/marks.dot"
to_graphviz(dag, fname)
Sys.isapple() && run(`open -a GraphViz.app $(fname)`)

The DAG pdf is here.

In this example a DataFrame with observed values has been provided and the related covariance matrix will be computed and stored in the DAG object:

display(dag.s)

5×5 Named Array{Float64,2}
Rows ╲ Cols │  :mechanics     :vectors     :algebra    :analysis  :statistics
────────────┼────────────────────────────────────────────────────────────────
:mechanics  │     305.768      127.223      101.579      106.273      117.405
:vectors    │     127.223      172.842      85.1573      94.6729       99.012
:algebra    │     101.579      85.1573      112.886      112.113      121.871
:analysis   │     106.273      94.6729      112.113       220.38      155.536
:statistics │     117.405       99.012      121.871      155.536      297.755

Additional DAG related functions are adjacency_matrix(), edge_matrix() and dag_vars().

Importing from and exporting to dagitty.net, dagitty and ggm (both are R packages)

Importing is easiest implicitly using the functions from_dagitty() and from_ggm() while constructing a DAG object, as shown above.

To export to dagitty.net, copy and paste the output from to_dagitty() into the Model code field on the dagitty.net web interface.

For both R packages, copy the output from to_dagitty() or to_ggm() to R.

Adding a observations DataFrame or a covariance matrix

Use set_dag_df!() and set_dag_cov_matrix!() for this. Note that if a DataFrame is added a covariance matrix is computed.

Although this initial version of StructuralCausalModels does not support latent variables yet, by using the keyword argument force=true no check is performed if all vertices/variables in the causal diagram are present in the DataFrame or covariance matrix.

Directed separation

Given a causal graph dag, d_separation(dag, f, l, cond) determines if the vertices in set f are d-separated from the vertices in set l given the conditioning set cond.

Show several d_separation results for the marks model:

f = [:statistics]; s = [:mechanics];

e = d_separation(dag, f, s; cond=:algebra);
println("d_separation($(dag.name), $f, $s; cond=:algebra) = $e")

e = d_separation(dag, f, s);
println("d_separation($(dag.name), $f, $s) = $e")

print("d_separation($(dag.name), [:statistics], [:mechanics, :analysis]; cond=[:algebra]) = ");
println(d_separation(dag, [:statistics], [:mechanics, :analysis]; cond=[:algebra]))

will produce:

d_separation(marks, [:statistics], [:mechanics]; cond=:algebra) = false
d_separation(marks, [:statistics], [:mechanics]) = true
d_separation(marks, [:statistics], [:mechanics, :analysis]: cond=[:algebra]) = false

Basis set

A minimal set of d_separation statements is called a basis_set.

A basis_set is not necessarily unique but it is sufficient to predict the complete set of d_separation statements.

Compute the basis_set:

bs = basis_set(dag)
display(bs)

BasisSet[
  vectors ∐ analysis | [:algebra]
  vectors ∐ statistics | [:algebra, :analysis]
  mechanics ∐ analysis | [:algebra, :vectors]
  mechanics ∐ statistics | [:algebra, :vectors, :analysis]
]

Adjustment sets

The function basis_set() provides a set of conditional independencies given the causal model. The conditioning set closes ("blocks the flow of causal info") all paths. The conditioning_set can be empty. It provides ways to test the chosen causal model given observational data.

The function adjustment_sets() answers a related question, i.e. how to prevent confounding in multiple regression models assuming the chosen causal model is correct.

Setup the WaffleDivorce example from StatisticalRethinking:

using StructuralCausalModels

ProjDir = @__DIR__
cd(ProjDir) #do

df = DataFrame!(CSV.File(scm_path("..", "data", "WaffleDivorce.csv"), delim=';');
df = DataFrame(
  :s => df[:, :South],
  :a => df[:, :MedianAgeMarriage],
  :m => df[:, :Marriage],
  :w => df[:, :WaffleHouses],
  :d => df[:, :Divorce]
);

fname = scm_path("..", "examples", "SR", "SR6.4.3", "sr6.4.3.dot")
Sys.isapple() && run(`open -a GraphViz.app $(fname)`)

d = OrderedDict(
  [:a, :m, :w] => :s,
  :d => [:w, :m, :a],
  :m => [:a]
);
u = []

dag = DAG("sr6_4_3", d, df);
show(dag)

adjustmentsets = adjustment_sets(dag, :w, :d)
println("Adjustment sets for open paths: $(openpaths)\n")
adjustmentsets |> display

Adjustment sets:
2-element Array{Array{Symbol,1},1}:
 [:s]
 [:a, :m]

Shipley test

Perform the Shipley test:

t = shipley_test(dag)
display(t)

(ctest = 2.816854003338401, df = 8, pv = 0.9453198036802164)

Paths

Adjustment_sets is based on several path manipulations and checks:

allpaths  = all_paths(dag, :w, :d)
println("All paths between :w and :d:")
allpaths |> display
println()

for this DAG returns:

4-element Array{Array{Symbol,1},1}:
 [:w, :s, :a, :d]
 [:w, :s, :a, :m, :d]
 [:w, :s, :m, :d]
 [:w, :s, :m, :a, :d]

backdoorpaths = backdoor_paths(dag, allpaths, :w)
println("All backdoors between :w and :d:")
backdoorpaths |> display
println()

4-element Array{Array{Symbol,1},1}:
 [:w, :s, :a, :d]
 [:w, :s, :a, :m, :d]
 [:w, :s, :m, :d]
 [:w, :s, :m, :a, :d]

println("Show path: $(allpaths[2])")
show_dag_path(dag, allpaths[2]) |> display
println()

":w ⇐ :s ⇒ :a ⇒ :m ⇒ :d"

Ancestral graph

Setup an ancestral_graph example:

using StructuralCausalModels, Test

ProjDir = @__DIR__

#include(scm_path("test_methods", "test_ag.jl"))

amat_data = transpose(reshape([
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,
  0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,
  0,0,0,0,1,0,1,0,1,1,0,0,0,0,0,0,
  1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,
  0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0
], (16,16)));

vars = [Symbol("n$i") for i in 1:size(amat_data, 1)]
a = NamedArray(Int.(amat_data), (vars, vars), ("Rows", "Cols"));

dag = DAG("ag_example", a)

m = [:n3, :n5, :n6, :n15, :n16];
c = [:n4, :n7];

fr = StructuralCausalModels.test_ag(a; m=m, c=c)

fr1 = ancestral_graph(a; m=m, c=c)
@test all(fr .== fr1)

fr2 = StructuralCausalModels.test_ag(dag.a; m=m, c=c)
@test all(fr .== fr2);

println()
display(fr)
println()