clip_45_47s · StatisticalRethinking.jl

Load Julia packages (libraries) needed for the snippets in chapter 0

using StatisticalRethinking
using CmdStan, StanMCMCChain
gr(size=(500,500))

Plots.GRBackend()

CmdStan uses a tmp directory to store the output of cmdstan

ProjDir = rel_path("..", "chapters", "04")
cd(ProjDir)

snippet 4.7

howell1 = CSV.read(rel_path("..", "data", "Howell1.csv"), delim=';')
df = convert(DataFrame, howell1);

Use only adults

df2 = filter(row -> row[:age] >= 18, df)

352 rows × 4 columns

	height	weight	age	male
	Float64⍰	Float64⍰	Float64⍰	Int64⍰
1	151.765	47.8256	63.0	1
2	139.7	36.4858	63.0	0
3	136.525	31.8648	65.0	0
4	156.845	53.0419	41.0	1
5	145.415	41.2769	51.0	0
6	163.83	62.9926	35.0	1
7	149.225	38.2435	32.0	0
8	168.91	55.48	27.0	1
9	147.955	34.8699	19.0	0
10	165.1	54.4877	54.0	1
11	154.305	49.8951	47.0	0
12	151.13	41.2202	66.0	1
13	144.78	36.0322	73.0	0
14	149.9	47.7	20.0	0
15	150.495	33.8493	65.3	0
16	163.195	48.5627	36.0	1
17	157.48	42.3258	44.0	1
18	143.942	38.3569	31.0	0
19	161.29	48.9879	39.0	1
20	156.21	42.7227	29.0	0
21	146.4	35.4936	56.0	1
22	148.59	37.9033	45.0	0
23	147.32	35.4652	19.0	0
24	147.955	40.313	29.0	1
⋮	⋮	⋮	⋮	⋮

Define the Stan language model

weightsmodel = "
data {
 int < lower = 1 > N; // Sample size
 vector[N] height; // Predictor
 vector[N] weight; // Outcome
}

parameters {
 real alpha; // Intercept
 real beta; // Slope (regression coefficients)
 real < lower = 0 > sigma; // Error SD
}

model {
 height ~ normal(alpha + weight * beta , sigma);
}

generated quantities {
}
"

"\ndata {\n int < lower = 1 > N; // Sample size\n vector[N] height; // Predictor\n vector[N] weight; // Outcome\n}\n\nparameters {\n real alpha; // Intercept\n real beta; // Slope (regression coefficients)\n real < lower = 0 > sigma; // Error SD\n}\n\nmodel {\n height ~ normal(alpha + weight * beta , sigma);\n}\n\ngenerated quantities {\n}\n"

Define the Stanmodel and set the output format to :mcmcchain.

stanmodel = Stanmodel(name="weights", monitors = ["alpha", "beta", "sigma"],model=weightsmodel,
  output_format=:mcmcchain)

Input data for cmdstan

heightsdata = [
  Dict("N" => length(df2[:height]), "height" => df2[:height], "weight" => df2[:weight])
]

=====> /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04

1-element Array{Dict{String,Any},1}:
 Dict("height"=>Union{Missing, Float64}[151.765, 139.7, 136.525, 156.845, 145.415, 163.83, 149.225, 168.91, 147.955, 165.1  …  156.21, 160.655, 146.05, 156.21, 152.4, 162.56, 142.875, 162.56, 156.21, 158.75],"weight"=>Union{Missing, Float64}[47.8256, 36.4858, 31.8648, 53.0419, 41.2769, 62.9926, 38.2435, 55.48, 34.8699, 54.4877  …  44.0268, 47.8823, 39.4058, 41.0501, 40.8233, 47.0318, 34.2462, 52.1631, 54.0625, 52.5316],"N"=>352)

Sample using cmdstan

rc, chn, cnames = stan(stanmodel, heightsdata, ProjDir, diagnostics=false,
  CmdStanDir=CMDSTAN_HOME)

Show individual draws of correlated parameter values

display(chn.value[1:5,:,1])


make: `/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights' is up to date.

Length of data array is not equal to nchains,
all chains will use the first data dictionary.

Calling /home/travis/cmdstan/bin/stansummary to infer across chains.

Inference for Stan model: weights_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.

Warmup took (0.62, 0.62, 0.66, 0.63) seconds, 2.5 seconds total
Sampling took (0.63, 0.59, 0.64, 0.82) seconds, 2.7 seconds total

                Mean     MCSE  StdDev     5%   50%   95%    N_Eff  N_Eff/s    R_hat
lp__            -747  3.5e-02     1.2   -750  -747  -746  1.2e+03  4.5e+02  1.0e+00
accept_stat__   0.94  4.4e-03   0.098   0.73  0.98   1.0  5.1e+02  1.9e+02  1.0e+00
stepsize__      0.10  1.2e-02   0.017  0.073  0.11  0.11  2.0e+00  7.4e-01  1.3e+14
treedepth__      4.1  8.4e-02     1.2    2.0   4.0   6.0  2.1e+02  7.7e+01  1.0e+00
n_leapfrog__      28  3.0e+00      18    3.0    31    63  3.5e+01  1.3e+01  1.0e+00
divergent__     0.00     -nan    0.00   0.00  0.00  0.00     -nan     -nan     -nan
energy__         749  5.3e-02     1.7    747   749   752  1.1e+03  4.0e+02  1.0e+00
alpha            114  5.5e-02     1.9    111   114   117  1.2e+03  4.6e+02  1.0e+00
beta            0.91  1.2e-03   0.043   0.83  0.91  0.98  1.2e+03  4.6e+02  1.0e+00
sigma            5.1  4.5e-03    0.19    4.8   5.1   5.4  1.9e+03  6.9e+02  1.0e+00

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).

Plot estimates using the first N = 10 observations

p = Vector{Plots.Plot{Plots.GRBackend}}(undef, 4)
nvals = [10, 50, 150, 200]

for i in 1:length(nvals)
  N = nvals[i]
  heightsdataN = [
    Dict("N" => N, "height" => df2[1:N, :height], "weight" => df2[1:N, :weight])
  ]
  rc, chnN, cnames = stan(stanmodel, heightsdataN, ProjDir, diagnostics=false,
    CmdStanDir=CMDSTAN_HOME)

  xi = 30.0:0.1:65.0
  rws, vars, chns = size(chnN[:, 1, :])
  alpha_vals = convert(Vector{Float64}, reshape(chnN.value[:, 1, :], (rws*chns)))
  beta_vals = convert(Vector{Float64}, reshape(chnN.value[:, 2, :], (rws*chns)))

  p[i] = scatter(df2[1:N, :weight], df2[1:N, :height], leg=false, xlab="weight")
  for j in 1:N
    yi = alpha_vals[j] .+ beta_vals[j]*xi
    plot!(p[i], xi, yi, title="N = $N")
  end
end
plot(p..., layout=(2, 2))

End of clip4547s.jl

This page was generated using Literate.jl.