clip_45_47s

Load Julia packages (libraries) needed for the snippets in chapter 0

using StatisticalRethinking
using CmdStan, StanMCMCChain
gr(size=(500,500))
Plots.GRBackend()

CmdStan uses a tmp directory to store the output of cmdstan

ProjDir = rel_path("..", "chapters", "04")
cd(ProjDir)

snippet 4.7

howell1 = CSV.read(rel_path("..", "data", "Howell1.csv"), delim=';')
df = convert(DataFrame, howell1);

Use only adults

df2 = filter(row -> row[:age] >= 18, df)

352 rows × 4 columns

heightweightagemale
Float64⍰Float64⍰Float64⍰Int64⍰
1151.76547.825663.01
2139.736.485863.00
3136.52531.864865.00
4156.84553.041941.01
5145.41541.276951.00
6163.8362.992635.01
7149.22538.243532.00
8168.9155.4827.01
9147.95534.869919.00
10165.154.487754.01
11154.30549.895147.00
12151.1341.220266.01
13144.7836.032273.00
14149.947.720.00
15150.49533.849365.30
16163.19548.562736.01
17157.4842.325844.01
18143.94238.356931.00
19161.2948.987939.01
20156.2142.722729.00
21146.435.493656.01
22148.5937.903345.00
23147.3235.465219.00
24147.95540.31329.01

Define the Stan language model

weightsmodel = "
data {
 int < lower = 1 > N; // Sample size
 vector[N] height; // Predictor
 vector[N] weight; // Outcome
}

parameters {
 real alpha; // Intercept
 real beta; // Slope (regression coefficients)
 real < lower = 0 > sigma; // Error SD
}

model {
 height ~ normal(alpha + weight * beta , sigma);
}

generated quantities {
}
"
"\ndata {\n int < lower = 1 > N; // Sample size\n vector[N] height; // Predictor\n vector[N] weight; // Outcome\n}\n\nparameters {\n real alpha; // Intercept\n real beta; // Slope (regression coefficients)\n real < lower = 0 > sigma; // Error SD\n}\n\nmodel {\n height ~ normal(alpha + weight * beta , sigma);\n}\n\ngenerated quantities {\n}\n"

Define the Stanmodel and set the output format to :mcmcchain.

stanmodel = Stanmodel(name="weights", monitors = ["alpha", "beta", "sigma"],model=weightsmodel,
  output_format=:mcmcchain)

Input data for cmdstan

heightsdata = [
  Dict("N" => length(df2[:height]), "height" => df2[:height], "weight" => df2[:weight])
]
=====> /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04

1-element Array{Dict{String,Any},1}:
 Dict("height"=>Union{Missing, Float64}[151.765, 139.7, 136.525, 156.845, 145.415, 163.83, 149.225, 168.91, 147.955, 165.1  …  156.21, 160.655, 146.05, 156.21, 152.4, 162.56, 142.875, 162.56, 156.21, 158.75],"weight"=>Union{Missing, Float64}[47.8256, 36.4858, 31.8648, 53.0419, 41.2769, 62.9926, 38.2435, 55.48, 34.8699, 54.4877  …  44.0268, 47.8823, 39.4058, 41.0501, 40.8233, 47.0318, 34.2462, 52.1631, 54.0625, 52.5316],"N"=>352)

Sample using cmdstan

rc, chn, cnames = stan(stanmodel, heightsdata, ProjDir, diagnostics=false,
  CmdStanDir=CMDSTAN_HOME)

Show individual draws of correlated parameter values

display(chn.value[1:5,:,1])

make: `/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights' is up to date.

Length of data array is not equal to nchains,
all chains will use the first data dictionary.

Calling /home/travis/cmdstan/bin/stansummary to infer across chains.

Inference for Stan model: weights_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.

Warmup took (0.62, 0.62, 0.66, 0.63) seconds, 2.5 seconds total
Sampling took (0.63, 0.59, 0.64, 0.82) seconds, 2.7 seconds total

                Mean     MCSE  StdDev     5%   50%   95%    N_Eff  N_Eff/s    R_hat
lp__            -747  3.5e-02     1.2   -750  -747  -746  1.2e+03  4.5e+02  1.0e+00
accept_stat__   0.94  4.4e-03   0.098   0.73  0.98   1.0  5.1e+02  1.9e+02  1.0e+00
stepsize__      0.10  1.2e-02   0.017  0.073  0.11  0.11  2.0e+00  7.4e-01  1.3e+14
treedepth__      4.1  8.4e-02     1.2    2.0   4.0   6.0  2.1e+02  7.7e+01  1.0e+00
n_leapfrog__      28  3.0e+00      18    3.0    31    63  3.5e+01  1.3e+01  1.0e+00
divergent__     0.00     -nan    0.00   0.00  0.00  0.00     -nan     -nan     -nan
energy__         749  5.3e-02     1.7    747   749   752  1.1e+03  4.0e+02  1.0e+00
alpha            114  5.5e-02     1.9    111   114   117  1.2e+03  4.6e+02  1.0e+00
beta            0.91  1.2e-03   0.043   0.83  0.91  0.98  1.2e+03  4.6e+02  1.0e+00
sigma            5.1  4.5e-03    0.19    4.8   5.1   5.4  1.9e+03  6.9e+02  1.0e+00

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).

Plot estimates using the first N = 10 observations

p = Vector{Plots.Plot{Plots.GRBackend}}(undef, 4)
nvals = [10, 50, 150, 200]

for i in 1:length(nvals)
  N = nvals[i]
  heightsdataN = [
    Dict("N" => N, "height" => df2[1:N, :height], "weight" => df2[1:N, :weight])
  ]
  rc, chnN, cnames = stan(stanmodel, heightsdataN, ProjDir, diagnostics=false,
    CmdStanDir=CMDSTAN_HOME)

  xi = 30.0:0.1:65.0
  rws, vars, chns = size(chnN[:, 1, :])
  alpha_vals = convert(Vector{Float64}, reshape(chnN.value[:, 1, :], (rws*chns)))
  beta_vals = convert(Vector{Float64}, reshape(chnN.value[:, 2, :], (rws*chns)))

  p[i] = scatter(df2[1:N, :weight], df2[1:N, :height], leg=false, xlab="weight")
  for j in 1:N
    yi = alpha_vals[j] .+ beta_vals[j]*xi
    plot!(p[i], xi, yi, title="N = $N")
  end
end
plot(p..., layout=(2, 2))
30 40 50 60 140 150 160 170 N = 10 weight 30 40 50 60 140 150 160 170 N = 50 weight 30 40 50 60 140 150 160 170 180 N = 150 weight 30 40 50 60 140 150 160 170 180 N = 200 weight

End of clip4547s.jl

This page was generated using Literate.jl.