Load Julia packages (libraries) needed for the snippets in chapter 0
using StatisticalRethinking
using CmdStan, StanMCMCChain
gr(size=(500,800))Plots.GRBackend()CmdStan uses a tmp directory to store the output of cmdstan
ProjDir = rel_path("..", "chapters", "04")
cd(ProjDir)snippet 4.7
howell1 = CSV.read(joinpath(dirname(Base.pathof(StatisticalRethinking)), "..", "data", "Howell1.csv"), delim=';')
df = convert(DataFrame, howell1);Use only adults
df2 = filter(row -> row[:age] >= 18, df)| height | weight | age | male | |
|---|---|---|---|---|
| Float64⍰ | Float64⍰ | Float64⍰ | Int64⍰ | |
| 1 | 151.765 | 47.8256 | 63.0 | 1 |
| 2 | 139.7 | 36.4858 | 63.0 | 0 |
| 3 | 136.525 | 31.8648 | 65.0 | 0 |
| 4 | 156.845 | 53.0419 | 41.0 | 1 |
| 5 | 145.415 | 41.2769 | 51.0 | 0 |
| 6 | 163.83 | 62.9926 | 35.0 | 1 |
| 7 | 149.225 | 38.2435 | 32.0 | 0 |
| 8 | 168.91 | 55.48 | 27.0 | 1 |
| 9 | 147.955 | 34.8699 | 19.0 | 0 |
| 10 | 165.1 | 54.4877 | 54.0 | 1 |
| 11 | 154.305 | 49.8951 | 47.0 | 0 |
| 12 | 151.13 | 41.2202 | 66.0 | 1 |
| 13 | 144.78 | 36.0322 | 73.0 | 0 |
| 14 | 149.9 | 47.7 | 20.0 | 0 |
| 15 | 150.495 | 33.8493 | 65.3 | 0 |
| 16 | 163.195 | 48.5627 | 36.0 | 1 |
| 17 | 157.48 | 42.3258 | 44.0 | 1 |
| 18 | 143.942 | 38.3569 | 31.0 | 0 |
| 19 | 161.29 | 48.9879 | 39.0 | 1 |
| 20 | 156.21 | 42.7227 | 29.0 | 0 |
| 21 | 146.4 | 35.4936 | 56.0 | 1 |
| 22 | 148.59 | 37.9033 | 45.0 | 0 |
| 23 | 147.32 | 35.4652 | 19.0 | 0 |
| 24 | 147.955 | 40.313 | 29.0 | 1 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Define the Stan language model
heightsmodel = "
// Inferring a Rate
data {
int N;
real<lower=0> h[N];
}
parameters {
real<lower=0> sigma;
real<lower=0,upper=250> mu;
}
model {
// Priors for mu and sigma
mu ~ uniform(100, 250);
sigma ~ cauchy( 0 , 1 );
// Observed heights
h ~ normal(mu, sigma);
}
""\n// Inferring a Rate\ndata {\n int N;\n real<lower=0> h[N];\n}\nparameters {\n real<lower=0> sigma;\n real<lower=0,upper=250> mu;\n}\nmodel {\n // Priors for mu and sigma\n mu ~ uniform(100, 250);\n sigma ~ cauchy( 0 , 1 );\n\n // Observed heights\n h ~ normal(mu, sigma);\n}\n"Define the Stanmodel and set the output format to :mcmcchain.
stanmodel = Stanmodel(name="heights", monitors = ["mu", "sigma"],model=heightsmodel,
output_format=:mcmcchain)Input data for cmdstan
heightsdata = [
Dict("N" => length(df2[:height]), "h" => df2[:height])
]=====> /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04
File /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.stan will be updated.
1-element Array{Dict{String,Any},1}:
Dict("N"=>352,"h"=>Union{Missing, Float64}[151.765, 139.7, 136.525, 156.845, 145.415, 163.83, 149.225, 168.91, 147.955, 165.1 … 156.21, 160.655, 146.05, 156.21, 152.4, 162.56, 142.875, 162.56, 156.21, 158.75])Sample using cmdstan
rc, chn, cnames = stan(stanmodel, heightsdata, ProjDir, diagnostics=false,
CmdStanDir=CMDSTAN_HOME)Describe the draws
describe(chn)
--- Translating Stan model to C++ code ---
bin/stanc /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.stan --o=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.hpp
Model name=heights_model
Input file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.stan
Output file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.hpp
--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe src/cmdstan/main.cpp -O3 -o /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights -include /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.hpp stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_idas.a
Length of data array is not equal to nchains,
all chains will use the first data dictionary.
Calling /home/travis/cmdstan/bin/stansummary to infer across chains.
Inference for Stan model: heights_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.
Warmup took (0.029, 0.026, 0.025, 0.027) seconds, 0.11 seconds total
Sampling took (0.037, 0.046, 0.040, 0.052) seconds, 0.17 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -895 2.3e-02 1.0 -897 -895 -894 2.0e+03 1.1e+04 1.0e+00
accept_stat__ 0.92 1.4e-03 0.098 0.71 0.96 1.0 4.7e+03 2.7e+04 1.0e+00
stepsize__ 0.81 2.4e-02 0.034 0.77 0.82 0.86 2.0e+00 1.1e+01 3.0e+13
treedepth__ 2.1 1.0e-01 0.83 1.0 2.0 4.0 6.8e+01 3.9e+02 1.0e+00
n_leapfrog__ 6.0 1.1e+00 8.4 1.0 3.0 23 5.9e+01 3.4e+02 1.0e+00
divergent__ 0.00 -nan 0.00 0.00 0.00 0.00 -nan -nan -nan
energy__ 896 3.5e-02 1.5 894 896 899 1.7e+03 9.9e+03 1.0e+00
sigma 7.7 5.4e-03 0.30 7.3 7.7 8.2 3.0e+03 1.7e+04 1.0e+00
mu 155 6.8e-03 0.42 154 155 155 3.8e+03 2.2e+04 1.0e+00
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000
Empirical Posterior Estimates:
Mean SD Naive SE MCSE ESS
sigma 7.746845 0.29698506 0.004695746 0.005029119 1000
mu 154.597421 0.41688256 0.006591492 0.006623318 1000
Quantiles:
2.5% 25.0% 50.0% 75.0% 97.5%
sigma 7.183481 7.5444175 7.740005 7.938735 8.362853
mu 153.803975 154.3120000 154.598000 154.877000 155.411100Compare with previous result
clip_07s_example_output = "
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000
Empirical Posterior Estimates:
Mean SD Naive SE MCSE ESS
sigma 7.7641718 0.3055115 0.004830561 0.0047596714 1000
mu 154.6042417 0.4158242 0.006574758 0.0068304868 1000
Quantiles:
2.5% 25.0% 50.0% 75.0% 97.5%
sigma 7.198721 7.5573575 7.749435 7.960795 8.393317
mu 153.795975 154.3307500 154.610000 154.884000 155.391050
""\n\nSamples were drawn using hmc with nuts.\nFor each parameter, N_Eff is a crude measure of effective sample size,\nand R_hat is the potential scale reduction factor on split chains (at\nconvergence, R_hat=1).\n\nIterations = 1:1000\nThinning interval = 1\nChains = 1,2,3,4\nSamples per chain = 1000\n\nEmpirical Posterior Estimates:\n Mean SD Naive SE MCSE ESS\nsigma 7.7641718 0.3055115 0.004830561 0.0047596714 1000\n mu 154.6042417 0.4158242 0.006574758 0.0068304868 1000\n\nQuantiles:\n 2.5% 25.0% 50.0% 75.0% 97.5%\nsigma 7.198721 7.5573575 7.749435 7.960795 8.393317\n mu 153.795975 154.3307500 154.610000 154.884000 155.391050\n\n"Plot the density of posterior draws
density(chn)
Close cd(ProjDir) do block
This page was generated using Literate.jl.