TuringModels

Multinomial Poisson regression

On page 399 of Statistical Rethinking Edition 1, m13.2 is defined as

AiBinomial(ni,pi)logit(pi)=αdept[i]+βmiαdeptNormal(α,σ)αNormal(0,10)βNormal(0,1)σHalfCauchy(0,2) \begin{aligned} A_i &\sim \text{Binomial}(n_i, p_i) \\ \text{logit}(p_i) &= \alpha_\text{dept}[i] + \beta m_i \\ \alpha_\text{dept} &\sim \text{Normal}(\alpha, \sigma) \\ \alpha &\sim \text{Normal}(0, 10) \\ \beta &\sim \text{Normal}(0, 1) \\ \sigma &\sim \text{HalfCauchy}(0, 2) \end{aligned}
  1. Data
  2. Model
  3. Output
  4. Original output

Data

import CSV
import Random
import TuringModels

using DataFrames

Random.seed!(1)

data_path = joinpath(TuringModels.project_root, "data", "UCBadmit.csv")
df = CSV.read(data_path, DataFrame; delim=';')
12×5 DataFrame
 Row │ dept       gender     admit  reject  applications
     │ InlineSt…  InlineSt…  Int64  Int64   Int64
─────┼───────────────────────────────────────────────────
   1 │ A          male         512     313           825
   2 │ A          female        89      19           108
   3 │ B          male         353     207           560
   4 │ B          female        17       8            25
   5 │ C          male         120     205           325
   6 │ C          female       202     391           593
   7 │ D          male         138     279           417
   8 │ D          female       131     244           375
   9 │ E          male          53     138           191
  10 │ E          female        94     299           393
  11 │ F          male          22     351           373
  12 │ F          female        24     317           341

Model

using Turing

dept_map = Dict(key => idx for (idx, key) in enumerate(unique(df.dept)))
df.male = [g == "male" ? 1 : 0 for g in df.gender]
df.dept_id = [dept_map[de] for de in df.dept]

@model m13_2(applications, dept_id, male, admit) = begin
    sigma_dept ~ truncated(Cauchy(0, 2), 0, Inf)
    bm ~ Normal(0, 1)
    a ~ Normal(0, 10)
    a_dept ~ filldist(Normal(a, sigma_dept), 6)

    logit_p = a_dept[dept_id] + bm*male

    admit .~ BinomialLogit.(applications, logit_p)
end

chns = sample(
    m13_2(df.applications, df.dept_id, df.male, df.admit),
    NUTS(),
    1000
)
Chains MCMC chain (1000×21×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 7.08 seconds
Compute duration  = 7.08 seconds
parameters        = a, a_dept[3], a_dept[1], a_dept[4], a_dept[5], a_dept[2], sigma_dept, a_dept[6], bm
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64

  sigma_dept    1.4710    0.5150     0.0163    0.0186    741.7618    0.9994      104.8427
          bm   -0.0951    0.0829     0.0026    0.0036    487.2112    0.9994       68.8638
           a   -0.6206    0.6469     0.0205    0.0210    906.8934    1.0028      128.1828
   a_dept[1]    0.6748    0.1040     0.0033    0.0047    592.1886    1.0009       83.7016
   a_dept[2]    0.6261    0.1259     0.0040    0.0052    523.7038    0.9990       74.0217
   a_dept[3]   -0.5854    0.0725     0.0023    0.0023   1073.1758    0.9994      151.6856
   a_dept[4]   -0.6164    0.0893     0.0028    0.0034    738.2637    1.0010      104.3482
   a_dept[5]   -1.0547    0.1026     0.0032    0.0026   1315.6662    0.9991      185.9599
   a_dept[6]   -2.6063    0.1507     0.0048    0.0037   1335.9061    0.9996      188.8207

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

  sigma_dept    0.8142    1.0963    1.3638    1.6869    2.7179
          bm   -0.2513   -0.1510   -0.0928   -0.0355    0.0584
           a   -1.8775   -0.9866   -0.6127   -0.2438    0.6782
   a_dept[1]    0.4792    0.6056    0.6711    0.7412    0.8866
   a_dept[2]    0.3785    0.5425    0.6201    0.7079    0.8734
   a_dept[3]   -0.7222   -0.6337   -0.5843   -0.5376   -0.4468
   a_dept[4]   -0.7930   -0.6775   -0.6144   -0.5550   -0.4435
   a_dept[5]   -1.2592   -1.1262   -1.0511   -0.9854   -0.8602
   a_dept[6]   -2.9191   -2.7052   -2.6029   -2.5081   -2.3074

Output

using StatsPlots

StatsPlots.plot(chns)

Original output

"""
Inference for Stan model: 359c2483e3bdbf74fd0484be27c2909b.
    3 chains, each with iter=4500; warmup=500; thin=1;
    post-warmup draws per chain=4000, total post-warmup draws=12000.

                   mean se_mean   sd     2.5%      25%      50%      75%    97.5%
    a_dept[1]      0.67    0.00 0.10     0.48     0.61     0.67     0.74     0.87
    a_dept[2]      0.63    0.00 0.12     0.40     0.55     0.63     0.71     0.85
    a_dept[3]     -0.58    0.00 0.08    -0.73    -0.63    -0.58    -0.53    -0.44
    a_dept[4]     -0.62    0.00 0.09    -0.79    -0.67    -0.62    -0.56    -0.45
    a_dept[5]     -1.06    0.00 0.10    -1.26    -1.13    -1.06    -0.99    -0.86
    a_dept[6]     -2.61    0.00 0.16    -2.92    -2.71    -2.60    -2.50    -2.31
    a             -0.59    0.01 0.66    -1.94    -0.97    -0.59    -0.21     0.73
    bm            -0.09    0.00 0.08    -0.25    -0.15    -0.09    -0.04     0.06
    sigma_dept     1.49    0.01 0.62     0.78     1.09     1.35     1.71     3.01
    lp__       -2602.27    0.04 2.27 -2607.74 -2603.48 -2601.90 -2600.61 -2598.95
               n_eff Rhat
    a_dept[1]   8191    1
    a_dept[2]   8282    1
    a_dept[3]  10974    1
    a_dept[4]  10034    1
    a_dept[5]  10783    1
    a_dept[6]  10344    1
    a           5839    1
    bm          7137    1
    sigma_dept  4626    1
    lp__        4200    1
""";