clip_38.0s · StatisticalRethinking.jl

Load Julia packages (libraries) needed for the snippets in chapter 0

using StatisticalRethinking
using CmdStan, StanMCMCChain
gr(size=(500,800))

Plots.GRBackend()

CmdStan uses a tmp directory to store the output of cmdstan

ProjDir = rel_path("..", "chapters", "04")
cd(ProjDir)

snippet 4.7

howell1 = CSV.read(joinpath(dirname(Base.pathof(StatisticalRethinking)), "..", "data", "Howell1.csv"), delim=';')
df = convert(DataFrame, howell1);

Use only adults

df2 = filter(row -> row[:age] >= 18, df)

352 rows × 4 columns

	height	weight	age	male
	Float64⍰	Float64⍰	Float64⍰	Int64⍰
1	151.765	47.8256	63.0	1
2	139.7	36.4858	63.0	0
3	136.525	31.8648	65.0	0
4	156.845	53.0419	41.0	1
5	145.415	41.2769	51.0	0
6	163.83	62.9926	35.0	1
7	149.225	38.2435	32.0	0
8	168.91	55.48	27.0	1
9	147.955	34.8699	19.0	0
10	165.1	54.4877	54.0	1
11	154.305	49.8951	47.0	0
12	151.13	41.2202	66.0	1
13	144.78	36.0322	73.0	0
14	149.9	47.7	20.0	0
15	150.495	33.8493	65.3	0
16	163.195	48.5627	36.0	1
17	157.48	42.3258	44.0	1
18	143.942	38.3569	31.0	0
19	161.29	48.9879	39.0	1
20	156.21	42.7227	29.0	0
21	146.4	35.4936	56.0	1
22	148.59	37.9033	45.0	0
23	147.32	35.4652	19.0	0
24	147.955	40.313	29.0	1
⋮	⋮	⋮	⋮	⋮

Define the Stan language model

weightsmodel = "
data {
 int < lower = 1 > N; // Sample size
 vector[N] height; // Predictor
 vector[N] weight; // Outcome
}

parameters {
 real alpha; // Intercept
 real beta; // Slope (regression coefficients)
 real < lower = 0 > sigma; // Error SD
}

model {
 height ~ normal(alpha + weight * beta , sigma);
}

generated quantities {
}
"

"\ndata {\n int < lower = 1 > N; // Sample size\n vector[N] height; // Predictor\n vector[N] weight; // Outcome\n}\n\nparameters {\n real alpha; // Intercept\n real beta; // Slope (regression coefficients)\n real < lower = 0 > sigma; // Error SD\n}\n\nmodel {\n height ~ normal(alpha + weight * beta , sigma);\n}\n\ngenerated quantities {\n}\n"

Define the Stanmodel and set the output format to :mcmcchain.

stanmodel = Stanmodel(name="weights", monitors = ["alpha", "beta", "sigma"],model=weightsmodel,
  output_format=:mcmcchain)

Input data for cmdstan

heightsdata = [
  Dict("N" => length(df2[:height]), "height" => df2[:height], "weight" => df2[:weight])
]

=====> /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04


File /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.stan will be updated.

1-element Array{Dict{String,Any},1}:
 Dict("height"=>Union{Missing, Float64}[151.765, 139.7, 136.525, 156.845, 145.415, 163.83, 149.225, 168.91, 147.955, 165.1  …  156.21, 160.655, 146.05, 156.21, 152.4, 162.56, 142.875, 162.56, 156.21, 158.75],"weight"=>Union{Missing, Float64}[47.8256, 36.4858, 31.8648, 53.0419, 41.2769, 62.9926, 38.2435, 55.48, 34.8699, 54.4877  …  44.0268, 47.8823, 39.4058, 41.0501, 40.8233, 47.0318, 34.2462, 52.1631, 54.0625, 52.5316],"N"=>352)

Sample using cmdstan

rc, chn, cnames = stan(stanmodel, heightsdata, ProjDir, diagnostics=false,
  CmdStanDir=CMDSTAN_HOME)

Describe the draws

describe(chn)



--- Translating Stan model to C++ code ---
bin/stanc  /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.stan --o=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.hpp
Model name=weights_model
Input file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.stan
Output file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.hpp

--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe   src/cmdstan/main.cpp  -O3 -o /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights -include /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/weights.hpp stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_idas.a

Length of data array is not equal to nchains,
all chains will use the first data dictionary.

Calling /home/travis/cmdstan/bin/stansummary to infer across chains.

Inference for Stan model: weights_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.

Warmup took (0.64, 0.67, 0.59, 0.61) seconds, 2.5 seconds total
Sampling took (0.73, 0.63, 0.59, 0.62) seconds, 2.6 seconds total

                Mean     MCSE  StdDev     5%   50%   95%    N_Eff  N_Eff/s    R_hat
lp__            -747  3.7e-02     1.3   -750  -747  -746  1.2e+03  4.6e+02  1.0e+00
accept_stat__   0.92  2.9e-03    0.11   0.68  0.97   1.0  1.5e+03  5.9e+02  1.0e+00
stepsize__      0.11     -nan   0.016  0.081  0.11  0.12     -nan     -nan  8.0e+13
treedepth__      4.0  2.4e-02     1.2    2.0   4.0   5.0  2.3e+03  9.1e+02  1.0e+00
n_leapfrog__      26  1.2e+00      16    3.0    31    63  1.8e+02  6.9e+01  1.0e+00
divergent__     0.00     -nan    0.00   0.00  0.00  0.00     -nan     -nan     -nan
energy__         749  5.0e-02     1.8    747   749   752  1.3e+03  5.0e+02  1.0e+00
alpha            114  5.2e-02     1.9    111   114   117  1.4e+03  5.4e+02  1.0e+00
beta            0.91  1.1e-03   0.043   0.84  0.91  0.97  1.4e+03  5.4e+02  1.0e+00
sigma            5.1  5.0e-03    0.20    4.8   5.1   5.4  1.6e+03  6.1e+02  1.0e+00

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).

Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000

Empirical Posterior Estimates:
          Mean          SD       Naive SE       MCSE      ESS
alpha 113.86801250 1.928870896 0.0304981267 0.0542864187 1000
 beta   0.90528437 0.042588843 0.0006733887 0.0012060295 1000
sigma   5.10472534 0.197426474 0.0031215866 0.0048241412 1000

Quantiles:
          2.5%       25.0%       50.0%        75.0%        97.5%
alpha 110.0977500 112.5660000 113.8460000 115.18650000 117.70237500
 beta   0.8219799   0.8760902   0.9054555   0.93417425   0.98891355
sigma   4.7394502   4.9717750   5.0947800   5.23358250   5.51619950

Compare with a previous result

clip_38s_example_output = "

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).

Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000

Empirical Posterior Estimates:
          Mean         SD       Naive SE       MCSE     ESS
alpha 113.82267275 1.89871177 0.0300212691 0.053895503 1000
 beta   0.90629952 0.04155225 0.0006569987 0.001184630 1000
sigma   5.10334279 0.19755211 0.0031235731 0.004830464 1000

Quantiles:
          2.5%       25.0%       50.0%       75.0%       97.5%
alpha 110.1927000 112.4910000 113.7905000 115.1322500 117.5689750
 beta   0.8257932   0.8775302   0.9069425   0.9357115   0.9862574
sigma   4.7308260   4.9644050   5.0958800   5.2331875   5.5133417

"

"\n\nSamples were drawn using hmc with nuts.\nFor each parameter, N_Eff is a crude measure of effective sample size,\nand R_hat is the potential scale reduction factor on split chains (at\nconvergence, R_hat=1).\n\nIterations = 1:1000\nThinning interval = 1\nChains = 1,2,3,4\nSamples per chain = 1000\n\nEmpirical Posterior Estimates:\n          Mean         SD       Naive SE       MCSE     ESS\nalpha 113.82267275 1.89871177 0.0300212691 0.053895503 1000\n beta   0.90629952 0.04155225 0.0006569987 0.001184630 1000\nsigma   5.10334279 0.19755211 0.0031235731 0.004830464 1000\n\nQuantiles:\n          2.5%       25.0%       50.0%       75.0%       97.5%\nalpha 110.1927000 112.4910000 113.7905000 115.1322500 117.5689750\n beta   0.8257932   0.8775302   0.9069425   0.9357115   0.9862574\nsigma   4.7308260   4.9644050   5.0958800   5.2331875   5.5133417\n\n"

Plot the density of posterior draws

plot(chn)

Plot regression line using means and observations

xi = 30.0:0.1:65.0
rws, vars, chns = size(chn[:, 1, :])
alpha_vals = convert(Vector{Float64}, reshape(chn.value[:, 1, :], (rws*chns)))
beta_vals = convert(Vector{Float64}, reshape(chn.value[:, 2, :], (rws*chns)))
yi = mean(alpha_vals) .+ mean(beta_vals)*xi

scatter(df2[:weight], df2[:height], lab="Observations")
plot!(xi, yi, lab="Regression line")

End of clip_38.0s.jl

This page was generated using Literate.jl.