Load Julia packages (libraries) needed for the snippets in chapter 0
using StatisticalRethinking
using CmdStan, StanMCMCChain
gr(size=(500,500))Plots.GRBackend()CmdStan uses a tmp directory to store the output of cmdstan
ProjDir = rel_path("..", "chapters", "04")
cd(ProjDir)snippet 4.7
howell1 = CSV.read(rel_path("..", "data", "Howell1.csv"), delim=';')
df = convert(DataFrame, howell1);Use only adults
df2 = filter(row -> row[:age] >= 18, df)| height | weight | age | male | |
|---|---|---|---|---|
| Float64⍰ | Float64⍰ | Float64⍰ | Int64⍰ | |
| 1 | 151.765 | 47.8256 | 63.0 | 1 |
| 2 | 139.7 | 36.4858 | 63.0 | 0 |
| 3 | 136.525 | 31.8648 | 65.0 | 0 |
| 4 | 156.845 | 53.0419 | 41.0 | 1 |
| 5 | 145.415 | 41.2769 | 51.0 | 0 |
| 6 | 163.83 | 62.9926 | 35.0 | 1 |
| 7 | 149.225 | 38.2435 | 32.0 | 0 |
| 8 | 168.91 | 55.48 | 27.0 | 1 |
| 9 | 147.955 | 34.8699 | 19.0 | 0 |
| 10 | 165.1 | 54.4877 | 54.0 | 1 |
| 11 | 154.305 | 49.8951 | 47.0 | 0 |
| 12 | 151.13 | 41.2202 | 66.0 | 1 |
| 13 | 144.78 | 36.0322 | 73.0 | 0 |
| 14 | 149.9 | 47.7 | 20.0 | 0 |
| 15 | 150.495 | 33.8493 | 65.3 | 0 |
| 16 | 163.195 | 48.5627 | 36.0 | 1 |
| 17 | 157.48 | 42.3258 | 44.0 | 1 |
| 18 | 143.942 | 38.3569 | 31.0 | 0 |
| 19 | 161.29 | 48.9879 | 39.0 | 1 |
| 20 | 156.21 | 42.7227 | 29.0 | 0 |
| 21 | 146.4 | 35.4936 | 56.0 | 1 |
| 22 | 148.59 | 37.9033 | 45.0 | 0 |
| 23 | 147.32 | 35.4652 | 19.0 | 0 |
| 24 | 147.955 | 40.313 | 29.0 | 1 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Define the Stan language model
weightsmodel = "
data {
int < lower = 1 > N; // Sample size
vector[N] height; // Predictor
vector[N] weight; // Outcome
}
parameters {
real alpha; // Intercept
real beta; // Slope (regression coefficients)
real < lower = 0 > sigma; // Error SD
}
model {
height ~ normal(alpha + weight * beta , sigma);
}
generated quantities {
}
""\ndata {\n int < lower = 1 > N; // Sample size\n vector[N] height; // Predictor\n vector[N] weight; // Outcome\n}\n\nparameters {\n real alpha; // Intercept\n real beta; // Slope (regression coefficients)\n real < lower = 0 > sigma; // Error SD\n}\n\nmodel {\n height ~ normal(alpha + weight * beta , sigma);\n}\n\ngenerated quantities {\n}\n"Define the Stanmodel and set the output format to :mcmcchain.
stanmodel = Stanmodel(name="weights", monitors = ["alpha", "beta", "sigma"],model=weightsmodel,
output_format=:mcmcchain)Input data for cmdstan
heightsdata = [
Dict("N" => length(df2[:height]), "height" => df2[:height], "weight" => df2[:weight])
]=====> /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04
1-element Array{Dict{String,Any},1}:
Dict("height"=>Union{Missing, Float64}[151.765, 139.7, 136.525, 156.845, 145.415, 163.83, 149.225, 168.91, 147.955, 165.1 … 156.21, 160.655, 146.05, 156.21, 152.4, 162.56, 142.875, 162.56, 156.21, 158.75],"weight"=>Union{Missing, Float64}[47.8256, 36.4858, 31.8648, 53.0419, 41.2769, 62.9926, 38.2435, 55.48, 34.8699, 54.4877 … 44.0268, 47.8823, 39.4058, 41.0501, 40.8233, 47.0318, 34.2462, 52.1631, 54.0625, 52.5316],"N"=>352)Sample using cmdstan
rc, chn, cnames = stan(stanmodel, heightsdata, ProjDir, diagnostics=false,
CmdStanDir=CMDSTAN_HOME)Show individual draws of correlated parameter values
display(chn.value[1:5,:,1])
make: `/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights' is up to date.
Length of data array is not equal to nchains,
all chains will use the first data dictionary.
Calling /home/travis/cmdstan/bin/stansummary to infer across chains.
Inference for Stan model: weights_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.
Warmup took (0.62, 0.62, 0.66, 0.63) seconds, 2.5 seconds total
Sampling took (0.63, 0.59, 0.64, 0.82) seconds, 2.7 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -747 3.5e-02 1.2 -750 -747 -746 1.2e+03 4.5e+02 1.0e+00
accept_stat__ 0.94 4.4e-03 0.098 0.73 0.98 1.0 5.1e+02 1.9e+02 1.0e+00
stepsize__ 0.10 1.2e-02 0.017 0.073 0.11 0.11 2.0e+00 7.4e-01 1.3e+14
treedepth__ 4.1 8.4e-02 1.2 2.0 4.0 6.0 2.1e+02 7.7e+01 1.0e+00
n_leapfrog__ 28 3.0e+00 18 3.0 31 63 3.5e+01 1.3e+01 1.0e+00
divergent__ 0.00 -nan 0.00 0.00 0.00 0.00 -nan -nan -nan
energy__ 749 5.3e-02 1.7 747 749 752 1.1e+03 4.0e+02 1.0e+00
alpha 114 5.5e-02 1.9 111 114 117 1.2e+03 4.6e+02 1.0e+00
beta 0.91 1.2e-03 0.043 0.83 0.91 0.98 1.2e+03 4.6e+02 1.0e+00
sigma 5.1 4.5e-03 0.19 4.8 5.1 5.4 1.9e+03 6.9e+02 1.0e+00
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).Plot estimates using the first N = 10 observations
p = Vector{Plots.Plot{Plots.GRBackend}}(undef, 4)
nvals = [10, 50, 150, 200]
for i in 1:length(nvals)
N = nvals[i]
heightsdataN = [
Dict("N" => N, "height" => df2[1:N, :height], "weight" => df2[1:N, :weight])
]
rc, chnN, cnames = stan(stanmodel, heightsdataN, ProjDir, diagnostics=false,
CmdStanDir=CMDSTAN_HOME)
xi = 30.0:0.1:65.0
rws, vars, chns = size(chnN[:, 1, :])
alpha_vals = convert(Vector{Float64}, reshape(chnN.value[:, 1, :], (rws*chns)))
beta_vals = convert(Vector{Float64}, reshape(chnN.value[:, 2, :], (rws*chns)))
p[i] = scatter(df2[1:N, :weight], df2[1:N, :height], leg=false, xlab="weight")
for j in 1:N
yi = alpha_vals[j] .+ beta_vals[j]*xi
plot!(p[i], xi, yi, title="N = $N")
end
end
plot(p..., layout=(2, 2))
End of clip4547s.jl
This page was generated using Literate.jl.