Load Julia packages (libraries) needed for the snippets in chapter 0
using StatisticalRethinking
using CmdStan, StanMCMCChain
gr(size=(500,800))Plots.GRBackend()CmdStan uses a tmp directory to store the output of cmdstan
ProjDir = rel_path("..", "chapters", "04")
cd(ProjDir)snippet 4.7
howell1 = CSV.read(joinpath(dirname(Base.pathof(StatisticalRethinking)), "..", "data", "Howell1.csv"), delim=';')
df = convert(DataFrame, howell1);Use only adults
df2 = filter(row -> row[:age] >= 18, df)| height | weight | age | male | |
|---|---|---|---|---|
| Float64⍰ | Float64⍰ | Float64⍰ | Int64⍰ | |
| 1 | 151.765 | 47.8256 | 63.0 | 1 |
| 2 | 139.7 | 36.4858 | 63.0 | 0 |
| 3 | 136.525 | 31.8648 | 65.0 | 0 |
| 4 | 156.845 | 53.0419 | 41.0 | 1 |
| 5 | 145.415 | 41.2769 | 51.0 | 0 |
| 6 | 163.83 | 62.9926 | 35.0 | 1 |
| 7 | 149.225 | 38.2435 | 32.0 | 0 |
| 8 | 168.91 | 55.48 | 27.0 | 1 |
| 9 | 147.955 | 34.8699 | 19.0 | 0 |
| 10 | 165.1 | 54.4877 | 54.0 | 1 |
| 11 | 154.305 | 49.8951 | 47.0 | 0 |
| 12 | 151.13 | 41.2202 | 66.0 | 1 |
| 13 | 144.78 | 36.0322 | 73.0 | 0 |
| 14 | 149.9 | 47.7 | 20.0 | 0 |
| 15 | 150.495 | 33.8493 | 65.3 | 0 |
| 16 | 163.195 | 48.5627 | 36.0 | 1 |
| 17 | 157.48 | 42.3258 | 44.0 | 1 |
| 18 | 143.942 | 38.3569 | 31.0 | 0 |
| 19 | 161.29 | 48.9879 | 39.0 | 1 |
| 20 | 156.21 | 42.7227 | 29.0 | 0 |
| 21 | 146.4 | 35.4936 | 56.0 | 1 |
| 22 | 148.59 | 37.9033 | 45.0 | 0 |
| 23 | 147.32 | 35.4652 | 19.0 | 0 |
| 24 | 147.955 | 40.313 | 29.0 | 1 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Define the Stan language model
weightsmodel = "
data {
int < lower = 1 > N; // Sample size
vector[N] height; // Predictor
vector[N] weight; // Outcome
}
parameters {
real alpha; // Intercept
real beta; // Slope (regression coefficients)
real < lower = 0 > sigma; // Error SD
}
model {
height ~ normal(alpha + weight * beta , sigma);
}
generated quantities {
}
""\ndata {\n int < lower = 1 > N; // Sample size\n vector[N] height; // Predictor\n vector[N] weight; // Outcome\n}\n\nparameters {\n real alpha; // Intercept\n real beta; // Slope (regression coefficients)\n real < lower = 0 > sigma; // Error SD\n}\n\nmodel {\n height ~ normal(alpha + weight * beta , sigma);\n}\n\ngenerated quantities {\n}\n"Define the Stanmodel and set the output format to :mcmcchain.
stanmodel = Stanmodel(name="weights", monitors = ["alpha", "beta", "sigma"],model=weightsmodel,
output_format=:mcmcchain)Input data for cmdstan
heightsdata = [
Dict("N" => length(df2[:height]), "height" => df2[:height], "weight" => df2[:weight])
]=====> /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04
File /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.stan will be updated.
1-element Array{Dict{String,Any},1}:
Dict("height"=>Union{Missing, Float64}[151.765, 139.7, 136.525, 156.845, 145.415, 163.83, 149.225, 168.91, 147.955, 165.1 … 156.21, 160.655, 146.05, 156.21, 152.4, 162.56, 142.875, 162.56, 156.21, 158.75],"weight"=>Union{Missing, Float64}[47.8256, 36.4858, 31.8648, 53.0419, 41.2769, 62.9926, 38.2435, 55.48, 34.8699, 54.4877 … 44.0268, 47.8823, 39.4058, 41.0501, 40.8233, 47.0318, 34.2462, 52.1631, 54.0625, 52.5316],"N"=>352)Sample using cmdstan
rc, chn, cnames = stan(stanmodel, heightsdata, ProjDir, diagnostics=false,
CmdStanDir=CMDSTAN_HOME)Describe the draws
describe(chn)
--- Translating Stan model to C++ code ---
bin/stanc /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.stan --o=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.hpp
Model name=weights_model
Input file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.stan
Output file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.hpp
--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe src/cmdstan/main.cpp -O3 -o /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights -include /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.hpp stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_idas.a
Length of data array is not equal to nchains,
all chains will use the first data dictionary.
Calling /home/travis/cmdstan/bin/stansummary to infer across chains.
Inference for Stan model: weights_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.
Warmup took (0.64, 0.67, 0.59, 0.61) seconds, 2.5 seconds total
Sampling took (0.73, 0.63, 0.59, 0.62) seconds, 2.6 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -747 3.7e-02 1.3 -750 -747 -746 1.2e+03 4.6e+02 1.0e+00
accept_stat__ 0.92 2.9e-03 0.11 0.68 0.97 1.0 1.5e+03 5.9e+02 1.0e+00
stepsize__ 0.11 -nan 0.016 0.081 0.11 0.12 -nan -nan 8.0e+13
treedepth__ 4.0 2.4e-02 1.2 2.0 4.0 5.0 2.3e+03 9.1e+02 1.0e+00
n_leapfrog__ 26 1.2e+00 16 3.0 31 63 1.8e+02 6.9e+01 1.0e+00
divergent__ 0.00 -nan 0.00 0.00 0.00 0.00 -nan -nan -nan
energy__ 749 5.0e-02 1.8 747 749 752 1.3e+03 5.0e+02 1.0e+00
alpha 114 5.2e-02 1.9 111 114 117 1.4e+03 5.4e+02 1.0e+00
beta 0.91 1.1e-03 0.043 0.84 0.91 0.97 1.4e+03 5.4e+02 1.0e+00
sigma 5.1 5.0e-03 0.20 4.8 5.1 5.4 1.6e+03 6.1e+02 1.0e+00
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000
Empirical Posterior Estimates:
Mean SD Naive SE MCSE ESS
alpha 113.86801250 1.928870896 0.0304981267 0.0542864187 1000
beta 0.90528437 0.042588843 0.0006733887 0.0012060295 1000
sigma 5.10472534 0.197426474 0.0031215866 0.0048241412 1000
Quantiles:
2.5% 25.0% 50.0% 75.0% 97.5%
alpha 110.0977500 112.5660000 113.8460000 115.18650000 117.70237500
beta 0.8219799 0.8760902 0.9054555 0.93417425 0.98891355
sigma 4.7394502 4.9717750 5.0947800 5.23358250 5.51619950Compare with a previous result
clip_38s_example_output = "
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000
Empirical Posterior Estimates:
Mean SD Naive SE MCSE ESS
alpha 113.82267275 1.89871177 0.0300212691 0.053895503 1000
beta 0.90629952 0.04155225 0.0006569987 0.001184630 1000
sigma 5.10334279 0.19755211 0.0031235731 0.004830464 1000
Quantiles:
2.5% 25.0% 50.0% 75.0% 97.5%
alpha 110.1927000 112.4910000 113.7905000 115.1322500 117.5689750
beta 0.8257932 0.8775302 0.9069425 0.9357115 0.9862574
sigma 4.7308260 4.9644050 5.0958800 5.2331875 5.5133417
""\n\nSamples were drawn using hmc with nuts.\nFor each parameter, N_Eff is a crude measure of effective sample size,\nand R_hat is the potential scale reduction factor on split chains (at\nconvergence, R_hat=1).\n\nIterations = 1:1000\nThinning interval = 1\nChains = 1,2,3,4\nSamples per chain = 1000\n\nEmpirical Posterior Estimates:\n Mean SD Naive SE MCSE ESS\nalpha 113.82267275 1.89871177 0.0300212691 0.053895503 1000\n beta 0.90629952 0.04155225 0.0006569987 0.001184630 1000\nsigma 5.10334279 0.19755211 0.0031235731 0.004830464 1000\n\nQuantiles:\n 2.5% 25.0% 50.0% 75.0% 97.5%\nalpha 110.1927000 112.4910000 113.7905000 115.1322500 117.5689750\n beta 0.8257932 0.8775302 0.9069425 0.9357115 0.9862574\nsigma 4.7308260 4.9644050 5.0958800 5.2331875 5.5133417\n\n"Plot the density of posterior draws
plot(chn)
Plot regression line using means and observations
xi = 30.0:0.1:65.0
rws, vars, chns = size(chn[:, 1, :])
alpha_vals = convert(Vector{Float64}, reshape(chn.value[:, 1, :], (rws*chns)))
beta_vals = convert(Vector{Float64}, reshape(chn.value[:, 2, :], (rws*chns)))
yi = mean(alpha_vals) .+ mean(beta_vals)*xi
scatter(df2[:weight], df2[:height], lab="Observations")
plot!(xi, yi, lab="Regression line")
End of clip_38.0s.jl
This page was generated using Literate.jl.