Benchmark Results

In this section, we report key benchmark results comparing Turing, CmdStan, and DynamicHMC for a variety of models. The code for each of the benchmarks can be found in the Examples folder, while the corresponding code for the models in folder named Models. The benchmarks were performed with the following software and hardware:

Julia 1.2.0
CmdStan 5.2.0
Turing 0.7.0
AdvancedHMC 0.2.6
DynamicHMC 2.1.0
Ubuntu 18.04
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz

Before proceeding to the results, a few caveats should be noted. (1) Turing's performance may improve over time as it matures. (2) memory allocations and garbage collection time are not applicable for CmdStan because the heavy lifting is performed in C++ where it is not measured. (3) Compared to Stan, Turing and DynamicHMC exhibit poor scalability in large part due to the use of forward mode autodiff. As soon as the reverse mode autodiff package Zygote matures in Julia, it will become the default autodiff in MCMCBenchmarks.

Gaussian

Model

\[\mu \sim Normal(0,1)\]

\[\sigma \sim TCauchy(0,5,0,\infty)\]

\[Y \sim Normal(\mu,\sigma)\]

benchmark design

#Number of data points
Nd = [10, 100, 1000, 10_000]
#Number of simulations
Nreps = 50
options = (Nsamples=2000, Nadapt=1000, delta=.8, Nd=Nd)

speed

allocations

effective sample size

Signal Detection Theory

Model

\[d \sim Normal(0,1/\sqrt(.5))\]

\[c \sim Normal(0,1/\sqrt(2))\]

\[\theta_{hits} = ϕ(d/2-c)\]

\[\theta_{fas} = ϕ(-d/2-c)\]

\[n_{hits} \sim Binomial(N,\theta_{hits})\]

\[n_{fas} \sim Binomial(N,\theta_{fas})\]

benchmark design

#Number of data points
Nd = [10, 100, 1000, 10_000]
#Number of simulations
Nreps = 100
options = (Nsamples=2000, Nadapt=1000, delta=.8, Nd=Nd)
#perform the benchmark

speed

allocations

effective sample size

Linear Regression

Model

\[\beta_0 \sim Normal(0,10)\]

\[\boldsymbol{\beta} \sim Normal(0,10)\]

\[\sigma \sim TCauchy(0,5,0,\infty)\]

\[\mu = \beta_0 + \boldsymbol{X}\boldsymbol{\beta}\]

\[Y \sim Normal(\mu,\sigma)\]

benchmark design

#Number of data points
Nd = [10, 100, 1000]
#Number of coefficients
Nc = [2, 3]
#Number of simulations
Nreps = 50
options = (Nsamples=2000, Nadapt=1000, delta=.8, Nd=Nd, Nc=Nc)

speed

allocations

effective sample size

Linear Ballistic Accumulator (LBA)

Model

\[\tau \sim TNormal(.4,.1,0,y_{min})\]

\[A \sim TNormal(.8,.4,0,\infty)\]

\[k \sim TNormal(.2,.3,0,\infty)\]

\[v \sim Normal(0,3)\]

\[(t,c) \sim LBA(A,b,v,s,\tau)\]

where

\[t = y_i - t_{er}\]

\[b = A + k\]

\[s = 1\]

\[LBA(A,b,v,s,\tau) = f_c(t)\prod_{j \neq c} (1-F_j(t))\]

\[f_c(t) = \frac{1}{A} \left[-v_c \Phi\left( \frac{b-A-tv_c}{ts} \right) + \phi\left( \frac{b-A-tv_c}{ts} \right) + + v_c \Phi\left( \frac{b-tv_c}{ts} \right) + s \phi\left( \frac{b-tv_c}{ts} \right) \right]\]

\[\begin{multline*} F_c(t) = 1 + \frac{b-A-tv_c}{A} \Phi\left( \frac{b-A-tv_c}{ts} \right) - \frac{b-tv_c}{A} \Phi\left( \frac{b-tv_c}{ts} \right)\\ + \frac{ts}{A} \phi\left( \frac{b-A-tv_c}{ts} \right) - \frac{ts}{A} \phi\left( \frac{b-tv_c}{ts} \right) \end{multline*}\]

\[Y = {y_1,...,y_n}\]

\[y_{min} = minimum(Y)\]

benchmark design

#Number of data points
Nd = [10, 50, 200]
#Number of simulations
Nreps = 50
options = (Nsamples=2000, Nadapt=1000, delta=.8, Nd=Nd)

speed

allocations

effective sample size

Poisson Regression

Model

\[a_0 \sim Normal(0,10)\]

\[a_1 \sim Normal(0,1)\]

\[\sigma_{a0} \sim TCauchy(0,1,0,\infty)\]

\[a_{0i} ~ \sim Normal(0,\sigma_{a0})\]

\[\lambda = e^{a_0 + a_{0i} + a_1*x_i}\]

\[y_i \sim Poisson(\lambda)\]

benchmark design

# Number of data points per unit
Nd = [1, 2, 5]
# Number of units in model
Ns = 10
# Number of simulations
Nreps = 25
options = (Nsamples=2000, Nadapt=1000, delta=.8, Nd=Nd, Ns=Ns)

speed

allocations

effective sample size

Forward vs. Reverse Autodiff

Hierarchical Poisson
benchmark design


# Number of data points per unit
Nd = 1
# Number of units in model
Ns = [10, 20, 50]
# Number of simulations
Nreps = 20
autodiff = [:forward, :reverse]
options = (Nsamples=2000, Nadapt=1000, delta=.8, Nd=Nd, Ns=Ns, autodiff=autodiff)

speed

allocations

effective sample size