clip_38.0s

Load Julia packages (libraries) needed for the snippets in chapter 0

using StatisticalRethinking
using CmdStan, StanMCMCChain
gr(size=(500,800))
Plots.GRBackend()

CmdStan uses a tmp directory to store the output of cmdstan

ProjDir = rel_path("..", "chapters", "04")
cd(ProjDir)

snippet 4.7

howell1 = CSV.read(joinpath(dirname(Base.pathof(StatisticalRethinking)), "..", "data", "Howell1.csv"), delim=';')
df = convert(DataFrame, howell1);

Use only adults

df2 = filter(row -> row[:age] >= 18, df)

352 rows × 4 columns

heightweightagemale
Float64⍰Float64⍰Float64⍰Int64⍰
1151.76547.825663.01
2139.736.485863.00
3136.52531.864865.00
4156.84553.041941.01
5145.41541.276951.00
6163.8362.992635.01
7149.22538.243532.00
8168.9155.4827.01
9147.95534.869919.00
10165.154.487754.01
11154.30549.895147.00
12151.1341.220266.01
13144.7836.032273.00
14149.947.720.00
15150.49533.849365.30
16163.19548.562736.01
17157.4842.325844.01
18143.94238.356931.00
19161.2948.987939.01
20156.2142.722729.00
21146.435.493656.01
22148.5937.903345.00
23147.3235.465219.00
24147.95540.31329.01

Define the Stan language model

weightsmodel = "
data {
 int < lower = 1 > N; // Sample size
 vector[N] height; // Predictor
 vector[N] weight; // Outcome
}

parameters {
 real alpha; // Intercept
 real beta; // Slope (regression coefficients)
 real < lower = 0 > sigma; // Error SD
}

model {
 height ~ normal(alpha + weight * beta , sigma);
}

generated quantities {
}
"
"\ndata {\n int < lower = 1 > N; // Sample size\n vector[N] height; // Predictor\n vector[N] weight; // Outcome\n}\n\nparameters {\n real alpha; // Intercept\n real beta; // Slope (regression coefficients)\n real < lower = 0 > sigma; // Error SD\n}\n\nmodel {\n height ~ normal(alpha + weight * beta , sigma);\n}\n\ngenerated quantities {\n}\n"

Define the Stanmodel and set the output format to :mcmcchain.

stanmodel = Stanmodel(name="weights", monitors = ["alpha", "beta", "sigma"],model=weightsmodel,
  output_format=:mcmcchain)

Input data for cmdstan

heightsdata = [
  Dict("N" => length(df2[:height]), "height" => df2[:height], "weight" => df2[:weight])
]
=====> /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04


File /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.stan will be updated.

1-element Array{Dict{String,Any},1}:
 Dict("height"=>Union{Missing, Float64}[151.765, 139.7, 136.525, 156.845, 145.415, 163.83, 149.225, 168.91, 147.955, 165.1  …  156.21, 160.655, 146.05, 156.21, 152.4, 162.56, 142.875, 162.56, 156.21, 158.75],"weight"=>Union{Missing, Float64}[47.8256, 36.4858, 31.8648, 53.0419, 41.2769, 62.9926, 38.2435, 55.48, 34.8699, 54.4877  …  44.0268, 47.8823, 39.4058, 41.0501, 40.8233, 47.0318, 34.2462, 52.1631, 54.0625, 52.5316],"N"=>352)

Sample using cmdstan

rc, chn, cnames = stan(stanmodel, heightsdata, ProjDir, diagnostics=false,
  CmdStanDir=CMDSTAN_HOME)

Describe the draws

describe(chn)


--- Translating Stan model to C++ code ---
bin/stanc  /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.stan --o=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.hpp
Model name=weights_model
Input file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.stan
Output file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.hpp

--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe   src/cmdstan/main.cpp  -O3 -o /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights -include /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.hpp stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_idas.a

Length of data array is not equal to nchains,
all chains will use the first data dictionary.

Calling /home/travis/cmdstan/bin/stansummary to infer across chains.

Inference for Stan model: weights_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.

Warmup took (0.64, 0.67, 0.59, 0.61) seconds, 2.5 seconds total
Sampling took (0.73, 0.63, 0.59, 0.62) seconds, 2.6 seconds total

                Mean     MCSE  StdDev     5%   50%   95%    N_Eff  N_Eff/s    R_hat
lp__            -747  3.7e-02     1.3   -750  -747  -746  1.2e+03  4.6e+02  1.0e+00
accept_stat__   0.92  2.9e-03    0.11   0.68  0.97   1.0  1.5e+03  5.9e+02  1.0e+00
stepsize__      0.11     -nan   0.016  0.081  0.11  0.12     -nan     -nan  8.0e+13
treedepth__      4.0  2.4e-02     1.2    2.0   4.0   5.0  2.3e+03  9.1e+02  1.0e+00
n_leapfrog__      26  1.2e+00      16    3.0    31    63  1.8e+02  6.9e+01  1.0e+00
divergent__     0.00     -nan    0.00   0.00  0.00  0.00     -nan     -nan     -nan
energy__         749  5.0e-02     1.8    747   749   752  1.3e+03  5.0e+02  1.0e+00
alpha            114  5.2e-02     1.9    111   114   117  1.4e+03  5.4e+02  1.0e+00
beta            0.91  1.1e-03   0.043   0.84  0.91  0.97  1.4e+03  5.4e+02  1.0e+00
sigma            5.1  5.0e-03    0.20    4.8   5.1   5.4  1.6e+03  6.1e+02  1.0e+00

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).

Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000

Empirical Posterior Estimates:
          Mean          SD       Naive SE       MCSE      ESS
alpha 113.86801250 1.928870896 0.0304981267 0.0542864187 1000
 beta   0.90528437 0.042588843 0.0006733887 0.0012060295 1000
sigma   5.10472534 0.197426474 0.0031215866 0.0048241412 1000

Quantiles:
          2.5%       25.0%       50.0%        75.0%        97.5%
alpha 110.0977500 112.5660000 113.8460000 115.18650000 117.70237500
 beta   0.8219799   0.8760902   0.9054555   0.93417425   0.98891355
sigma   4.7394502   4.9717750   5.0947800   5.23358250   5.51619950

Compare with a previous result

clip_38s_example_output = "

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).

Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000

Empirical Posterior Estimates:
          Mean         SD       Naive SE       MCSE     ESS
alpha 113.82267275 1.89871177 0.0300212691 0.053895503 1000
 beta   0.90629952 0.04155225 0.0006569987 0.001184630 1000
sigma   5.10334279 0.19755211 0.0031235731 0.004830464 1000

Quantiles:
          2.5%       25.0%       50.0%       75.0%       97.5%
alpha 110.1927000 112.4910000 113.7905000 115.1322500 117.5689750
 beta   0.8257932   0.8775302   0.9069425   0.9357115   0.9862574
sigma   4.7308260   4.9644050   5.0958800   5.2331875   5.5133417

"
"\n\nSamples were drawn using hmc with nuts.\nFor each parameter, N_Eff is a crude measure of effective sample size,\nand R_hat is the potential scale reduction factor on split chains (at\nconvergence, R_hat=1).\n\nIterations = 1:1000\nThinning interval = 1\nChains = 1,2,3,4\nSamples per chain = 1000\n\nEmpirical Posterior Estimates:\n          Mean         SD       Naive SE       MCSE     ESS\nalpha 113.82267275 1.89871177 0.0300212691 0.053895503 1000\n beta   0.90629952 0.04155225 0.0006569987 0.001184630 1000\nsigma   5.10334279 0.19755211 0.0031235731 0.004830464 1000\n\nQuantiles:\n          2.5%       25.0%       50.0%       75.0%       97.5%\nalpha 110.1927000 112.4910000 113.7905000 115.1322500 117.5689750\n beta   0.8257932   0.8775302   0.9069425   0.9357115   0.9862574\nsigma   4.7308260   4.9644050   5.0958800   5.2331875   5.5133417\n\n"

Plot the density of posterior draws

plot(chn)
0 250 500 750 1000 108 110 112 114 116 118 120 alpha Iteration Sample value 108 111 114 117 120 0.00 0.05 0.10 0.15 0.20 alpha Sample value Density 0 250 500 750 1000 0.80 0.85 0.90 0.95 1.00 1.05 beta Iteration Sample value 0.75 0.80 0.85 0.90 0.95 1.00 1.05 0 2 4 6 8 beta Sample value Density 0 250 500 750 1000 4.5 5.0 5.5 6.0 sigma Iteration Sample value 4.5 5.0 5.5 6.0 0.0 0.5 1.0 1.5 2.0 sigma Sample value Density

Plot regression line using means and observations

xi = 30.0:0.1:65.0
rws, vars, chns = size(chn[:, 1, :])
alpha_vals = convert(Vector{Float64}, reshape(chn.value[:, 1, :], (rws*chns)))
beta_vals = convert(Vector{Float64}, reshape(chn.value[:, 2, :], (rws*chns)))
yi = mean(alpha_vals) .+ mean(beta_vals)*xi

scatter(df2[:weight], df2[:height], lab="Observations")
plot!(xi, yi, lab="Regression line")
30 40 50 60 140 150 160 170 180 Observations Regression line

End of clip_38.0s.jl

This page was generated using Literate.jl.