clip_07.1s · StatisticalRethinking.jl

Load Julia packages (libraries) needed for the snippets in chapter 0

using StatisticalRethinking
using CmdStan, StanMCMCChain
gr(size=(500,800))

Plots.GRBackend()

CmdStan uses a tmp directory to store the output of cmdstan

ProjDir = rel_path("..", "chapters", "04")
cd(ProjDir)

snippet 4.7

howell1 = CSV.read(joinpath(dirname(Base.pathof(StatisticalRethinking)), "..", "data", "Howell1.csv"), delim=';')
df = convert(DataFrame, howell1);

Use only adults

df2 = filter(row -> row[:age] >= 18, df)

352 rows × 4 columns

	height	weight	age	male
	Float64⍰	Float64⍰	Float64⍰	Int64⍰
1	151.765	47.8256	63.0	1
2	139.7	36.4858	63.0	0
3	136.525	31.8648	65.0	0
4	156.845	53.0419	41.0	1
5	145.415	41.2769	51.0	0
6	163.83	62.9926	35.0	1
7	149.225	38.2435	32.0	0
8	168.91	55.48	27.0	1
9	147.955	34.8699	19.0	0
10	165.1	54.4877	54.0	1
11	154.305	49.8951	47.0	0
12	151.13	41.2202	66.0	1
13	144.78	36.0322	73.0	0
14	149.9	47.7	20.0	0
15	150.495	33.8493	65.3	0
16	163.195	48.5627	36.0	1
17	157.48	42.3258	44.0	1
18	143.942	38.3569	31.0	0
19	161.29	48.9879	39.0	1
20	156.21	42.7227	29.0	0
21	146.4	35.4936	56.0	1
22	148.59	37.9033	45.0	0
23	147.32	35.4652	19.0	0
24	147.955	40.313	29.0	1
⋮	⋮	⋮	⋮	⋮

Define the Stan language model

heightsmodel = "
// Inferring a Rate
data {
  int N;
  real<lower=0> h[N];
}
parameters {
  real<lower=0> sigma;
  real<lower=0,upper=250> mu;
}
model {
  // Priors for mu and sigma
  mu ~ uniform(100, 250);
  sigma ~ cauchy( 0 , 1 );

  // Observed heights
  h ~ normal(mu, sigma);
}
"

"\n// Inferring a Rate\ndata {\n  int N;\n  real<lower=0> h[N];\n}\nparameters {\n  real<lower=0> sigma;\n  real<lower=0,upper=250> mu;\n}\nmodel {\n  // Priors for mu and sigma\n  mu ~ uniform(100, 250);\n  sigma ~ cauchy( 0 , 1 );\n\n  // Observed heights\n  h ~ normal(mu, sigma);\n}\n"

Define the Stanmodel and set the output format to :mcmcchain.

stanmodel = Stanmodel(name="heights", monitors = ["mu", "sigma"],model=heightsmodel,
  output_format=:mcmcchain)

Input data for cmdstan

heightsdata = [
  Dict("N" => length(df2[:height]), "h" => df2[:height])
]

=====> /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04


File /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.stan will be updated.

1-element Array{Dict{String,Any},1}:
 Dict("N"=>352,"h"=>Union{Missing, Float64}[151.765, 139.7, 136.525, 156.845, 145.415, 163.83, 149.225, 168.91, 147.955, 165.1  …  156.21, 160.655, 146.05, 156.21, 152.4, 162.56, 142.875, 162.56, 156.21, 158.75])

Sample using cmdstan

rc, chn, cnames = stan(stanmodel, heightsdata, ProjDir, diagnostics=false,
  CmdStanDir=CMDSTAN_HOME)

Describe the draws

describe(chn)



--- Translating Stan model to C++ code ---
bin/stanc  /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.stan --o=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.hpp
Model name=heights_model
Input file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.stan
Output file=/home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.hpp

--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe   src/cmdstan/main.cpp  -O3 -o /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights -include /home/travis/build/StanJulia/StatisticalRethinking.jl/docs/build/04/tmp/heights.hpp stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_idas.a

Length of data array is not equal to nchains,
all chains will use the first data dictionary.

Calling /home/travis/cmdstan/bin/stansummary to infer across chains.

Inference for Stan model: heights_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.

Warmup took (0.029, 0.026, 0.025, 0.027) seconds, 0.11 seconds total
Sampling took (0.037, 0.046, 0.040, 0.052) seconds, 0.17 seconds total

                Mean     MCSE  StdDev    5%   50%   95%    N_Eff  N_Eff/s    R_hat
lp__            -895  2.3e-02     1.0  -897  -895  -894  2.0e+03  1.1e+04  1.0e+00
accept_stat__   0.92  1.4e-03   0.098  0.71  0.96   1.0  4.7e+03  2.7e+04  1.0e+00
stepsize__      0.81  2.4e-02   0.034  0.77  0.82  0.86  2.0e+00  1.1e+01  3.0e+13
treedepth__      2.1  1.0e-01    0.83   1.0   2.0   4.0  6.8e+01  3.9e+02  1.0e+00
n_leapfrog__     6.0  1.1e+00     8.4   1.0   3.0    23  5.9e+01  3.4e+02  1.0e+00
divergent__     0.00     -nan    0.00  0.00  0.00  0.00     -nan     -nan     -nan
energy__         896  3.5e-02     1.5   894   896   899  1.7e+03  9.9e+03  1.0e+00
sigma            7.7  5.4e-03    0.30   7.3   7.7   8.2  3.0e+03  1.7e+04  1.0e+00
mu               155  6.8e-03    0.42   154   155   155  3.8e+03  2.2e+04  1.0e+00

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).

Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000

Empirical Posterior Estimates:
         Mean        SD       Naive SE      MCSE     ESS
sigma   7.746845 0.29698506 0.004695746 0.005029119 1000
   mu 154.597421 0.41688256 0.006591492 0.006623318 1000

Quantiles:
         2.5%       25.0%       50.0%      75.0%      97.5%
sigma   7.183481   7.5444175   7.740005   7.938735   8.362853
   mu 153.803975 154.3120000 154.598000 154.877000 155.411100

Compare with previous result

clip_07s_example_output = "

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).

Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000

Empirical Posterior Estimates:
          Mean        SD      Naive SE      MCSE      ESS
sigma   7.7641718 0.3055115 0.004830561 0.0047596714 1000
   mu 154.6042417 0.4158242 0.006574758 0.0068304868 1000

Quantiles:
         2.5%       25.0%       50.0%      75.0%      97.5%
sigma   7.198721   7.5573575   7.749435   7.960795   8.393317
   mu 153.795975 154.3307500 154.610000 154.884000 155.391050

"

"\n\nSamples were drawn using hmc with nuts.\nFor each parameter, N_Eff is a crude measure of effective sample size,\nand R_hat is the potential scale reduction factor on split chains (at\nconvergence, R_hat=1).\n\nIterations = 1:1000\nThinning interval = 1\nChains = 1,2,3,4\nSamples per chain = 1000\n\nEmpirical Posterior Estimates:\n          Mean        SD      Naive SE      MCSE      ESS\nsigma   7.7641718 0.3055115 0.004830561 0.0047596714 1000\n   mu 154.6042417 0.4158242 0.006574758 0.0068304868 1000\n\nQuantiles:\n         2.5%       25.0%       50.0%      75.0%      97.5%\nsigma   7.198721   7.5573575   7.749435   7.960795   8.393317\n   mu 153.795975 154.3307500 154.610000 154.884000 155.391050\n\n"

Plot the density of posterior draws

density(chn)

Close cd(ProjDir) do block

This page was generated using Literate.jl.