Why going for Hamilton Monte Carlo

Since decades, probability scientists are continuously improving algorithms to draw random samples from pdf known up to a normalizing constant: \[ \pi(\theta) = \frac{f(\theta)}{\int_\theta f(\theta)d\theta} \] ( \(f(\theta)\) being a positive measure.)

Especially convenient for Bayesians \[ \pi(\theta) \propto \mathrm{prior}(\theta) \times \mathrm{Likelihood}(\theta)\] but also useful for frequentists working on latent class models with SEM based algorithms.

Unleashed Bayesian modelers tend to develop adhoc complex models but at the same time wish to rely effortlessly on available inference toolboxes.
Common softwares’ (BUGS, JAGS etc.) capabilities become rapidly saturated and the MCMC burn-in and cruising phases may take an unacceptably long while (because standard MCMC provide strongly correlated visits of the target distribution) : need for efficient algorithms (with less correlated trajectories such as the ones obtained from Piecewise Deterministic Markov Processes) as well as quick inference (recourse to C++).
A new class of gradient-based MC algorithms is becoming incredibly fashionable in many domains of Applied Stats.
Stan is a well documented statistical language, supported by many big names of the statistical community .

A word of warning:

He who loves practice without theory is like the sailor who boards ship without a rudder and compass and never knows where he may be cast, L. Da Vinci

…, although most friends of mine (with the exception of some Britons, maybe) would go and sail, blindly relying on modern GPS.

Trying to see farther by standing on giants’shoulders:

Much of the following material has been stolen from:

MCMC using Hamiltonian Dynamics by Neal R.M , chapter 5 from Handbook of Markov Chain Monte Carlo,edited by Brooks, Gelman, Jones & Meng
Efficient Bayesian Inference with Hamiltonian Monte Carlo (60 mn \(\times 2\)) by Betancourt M., video MLSS Iceland 2014, part 1 & 2, youtube.com/watch?v=xWQpEAyI5s8
Stan: A probabilistic programming language by Carpenter B., Gelman A., Hoffman M., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li J. & Riddell A. in the Journal of Statistical Software, 2015
Faster estimation of Bayesian models in ecology using HMC by Monnahan C. C., Thorson, J. T. & Branch, T. A. in Methods Ecol Evol.,2016 doi:10.1111/2041-210X.12681
Scalable Bayesian Inference with Hamiltonian Monte Carlo (40 minute video) by Betancourt M., ICERM Video Archive, 2016

Hamiltonian Agenda

Part 1 : a bit of theory
1. At the beginning was Physics: Hamiltonian dynamics
2. From Physics to Maths: Hamiltonian properties
3. From Maths to Stats: Hamiltonian MCMC
Part 2 : How to launch and run Stan (on a cherry tree)
Part 3 : Going further with Stan

Physics

Gravity : Newton ’s falling apple

With the dot notation for derivatives:

\[\begin{align} \vec{f} & =m\vec{\gamma}\\ mg & =m\ddot{x}\\ mg\dot{x} & =m\ddot{x}\dot{x}\\ 0 & =\frac{d}{dt}(-mgx+\frac{1}{2}mv^{2}) \end{align}\]

\[ \underset{\text{Energy}}{\underbrace{H(v,x)}}=\underset{\text{Potential}% }{\underbrace{U(x)}}+\underset{\text{Cinetic}}{\underbrace{K(v)}} \]

Coiled spring : oscillations

\[\begin{align*} f & =m\gamma\\ -kx & =m\ddot{x}\\ -kx\dot{x} & =m\ddot{x}\dot{x}\\ 0 & =\frac{d}{dt}(\frac{1}{2}mx^{2}+\frac{1}{2}mv^{2}) \end{align*}\]

\[ \underset{\text{Energy}}{\underbrace{H(v,x)}}=\underset{\text{Potential}% }{\underbrace{U(x)}}+\underset{\text{Cinetic}}{\underbrace{K(v)}} \]

More complex systems in physics obbey Hamilton equations

Usually, the total energy is written as:

\[\underset{\text{Energy}}{\underbrace{H(q,p)}}=\underset{\text{Potential}% }{\underbrace{U(q)}}+\underset{\text{Cinetic}}{\underbrace{K(p)}}\]

\(p\) : generalized momentum vector \(\in\mathbb{R}^{d}\)

\(q\) : generalized position vector \(\in\mathbb{R}^{d}\)

\[\begin{align} \frac{dq}{dt} & =\frac{\partial H(q,p)}{\partial p}\\ \frac{dp}{dt} & =-\frac{\partial H(q,p)}{\partial q} \end{align}\]

From Physics to Maths: Hamiltonian properties

What are the properties of a function \(H(p,q)\)… such that:

\[\begin{align*} \frac{dq}{dt} & =\frac{\partial H(q,p)}{\partial p}\\ \frac{dp}{dt} & =-\frac{\partial H(q,p)}{\partial q}% \end{align*}\]

More globally, in just a single equation:

\[\frac{dz}{dt}=J\times\nabla H\]

with \(z=\left(\begin{array}[c]{c} q\\ p \end{array} \right)\), \(J=\left(\begin{array}[c]{cc} 0_{d\times d} & 1_{d\times d}\\ -1_{d\times d} & 0_{d\times d} \end{array} \right)\)

Hamiltonian property 1 : Reversibility

The mapping \(\mathcal{T}_{s}\) : \(\left( t,\left( \begin{array}[c]{c} q(t)\\ p(t) \end{array} \right) \right) \longmapsto\left( t+s,\left( \begin{array}[c]{c} q(t+s)\\ p(t+s) \end{array}\right)\right)\) is one-to-one, thus reversible.

If \(H(q,p)=U(q)+K(p)\) with \(K(p)=K(-p)\), \(\mathcal{T}_{-s}\) is obtained by the sequence of operations \((p\rightarrow-p),\mathcal{T}_{s},(p\rightarrow-p)\)

Hamiltonian property 2 : Conservation of \(H\) along the trajectories of \(\mathcal{T}\)

\[\begin{align*} \frac{dH}{dt} & =\left( \frac{\partial H(q,p)}{\partial p}\right) \frac {dp}{dt}+\left( \frac{\partial H(q,p)}{\partial q}\right) \frac{dq}{dt}\\ & =\frac{\partial H(q,p)}{\partial p}\left( -\frac{\partial H(q,p)}{\partial q}\right) +\frac{\partial H(q,p)}{\partial q}\left( \frac{\partial H(q,p)}{\partial p}\right) \\ & =0 \end{align*}\]

Hamiltonian property 3 : Volume preservation along the trajectories of \(\mathcal{T}\)

A vector field \(\mathcal{T}\) with \(0\) divergence preserves volume:

\[\begin{align*} \Delta\mathcal{T} & =\left( \frac{\partial}{\partial p}\right) \frac {dp}{dt}+\left( \frac{\partial}{\partial q}\right) \frac{dq}{dt}\\ & =\frac{\partial}{\partial p}\left( -\frac{\partial H(q,p))}{\partial q}\right) +\frac{\partial}{\partial q}\left( \frac{\partial H(q,p)}{\partial p}\right) \\ & =-\left( \frac{{\partial^{2}}H(q,p)}{{\partial p}{\partial q}}\right) +\left( \frac{{\partial^{2}}H(q,p)}{{\partial p}{\partial q}}\right) =0 \end{align*}\]

i.e. preservation of pdf by a change of variables since the determinant of the Jacobian matrix \(J_{\delta}\) for a change of coordinates \(\mathcal{T}_{\delta}\) from \(s\) to \(s+\delta\) will be \(1\)!

Numerical integration along the trajectories of \(\mathcal{T}\)

How to practically evaluate \(\mathcal{T}_{\delta}\left(\begin{array}[c]{c} q(t)\\ p(t) \end{array}\right)\) for a small \(\delta\)?

Euler’s discretisation

\[\begin{align*} \left(\begin{array}[c]{c} q(t+\delta)\\ p(t+\delta) \end{array}\right) & \approx\left(\begin{array}[c]{c} q(t)\\ p(t) \end{array}\right) +\delta\left(\begin{array}[c]{c} \frac{dq(t)}{dt}\\ \frac{dp(t)}{dt} \end{array}\right) \\ & =\left(\begin{array}[c]{c} q(t)+\delta\frac{\partial H(q,p)}{\partial p}\\ p(t)-\delta\frac{\partial H(q,p)}{\partial q} \end{array}\right) \end{align*}\]

if \(H(q,p)=U(q)+K(p)\) and \(K(p)=\frac{1}{2}p^{\prime}M^{-1}p\)

\[\begin{align*} \left(\begin{array}[c]{c} q(t+\delta)\\ p(t+\delta) \end{array} \right) & \approx\left( \begin{array}[c]{c} q(t)+\delta\frac{\nabla K(p)}{\partial p}\\ p(t)-\delta\frac{\nabla U(q)}{\partial q} \end{array} \right) \\ & \approx\left( \begin{array}[c]{c} q(t)+\delta\times M^{-1}p(t)\\ p(t)-\delta\times\frac{\partial U}{\partial q}(q(t)) \end{array} \right) \end{align*}\]

Euler’s scheme may encounter problems…Let’s examine the example reproduced from FIG 5.1 of Neal with \(H(q,p)=q^{2}/2+p^{2}/2\) whose trajectories are circles. (The initial state is \(q=0,p=1\). \(L=20\) steps)

Eulerstep = function(epsilon, L, current_q, current_p) {
  Q <- current_q
  P <- current_p
  for (i in 1:L) {
  q = current_q + 1 * current_p * epsilon
  p = current_p - 1 * epsilon * current_q
  current_p = p
  current_q = q
  Q = c(Q, q)
  P = c(P, p)
  }
  return(list(P = P, Q = Q))
}
angle = seq(0, 2 * pi, length.out = 1000)
plot(cos(angle), sin(angle), lwd = 4, type = 'l', lty = 3, las = 1,
     xlab = "position q",ylab = "position q", xlim = c(-2, 2), ylim = c(-2, 2))
xy = Eulerstep(epsilon = 0.3, L = 20, current_q = 0, current_p = 1)
lines(xy$Q, xy$P, type = "b", pch = 19, col = 'red')

Euler’s modified discretisation

There are ways to improve the discretisation, for instance Euler’s shear transformation works as follows:

\[\begin{align*} q(t+\delta) & \approx q(t)+\delta\times M^{-1}p(\mathbf{t})\\ p(t+\delta) & \approx p(t)-\delta\times\frac{\partial U}{\partial q}(q(\mathbf{t+\delta})) \end{align*}\]

EulerModifstep = function(epsilon, L, current_q, current_p){
  Q = current_q
  P = current_p
  for (i in 1:L) {
    current_q = current_q + 1 * current_p * epsilon
    current_p = current_p - 1 * epsilon * current_q
    Q = c(Q, current_q)
    P = c(P, current_p)
  }
  return(list(P = P, Q = Q))
}
plot(cos(angle), sin(angle), lwd = 4, type = 'l', lty = 3, las = 1,
     xlab = "position q", ylab = "position q", xlim = c(-2, 2), ylim = c(-2, 2))
xymod = EulerModifstep(epsilon = 0.3, L = 20, current_q = 0, current_p = 1)
lines(xymod$Q, xymod$P, col = 'blue', type = "b", pch = 19)

Leapfrog discretization

The Leapfrog method improves over Euler’s methods \[\begin{align*} p(\mathbf{t+}\frac{\boldsymbol{\delta}}{\mathbf{2}}) & \approx p(t)-\frac {\delta}{2}\times\frac{\partial U}{\partial q}(q(\mathbf{t}))\\ q(\mathbf{t+}\boldsymbol{\delta}) & \approx q(t)+\delta\times M^{-1}p(\mathbf{t+}\frac{\boldsymbol{\delta}}{\mathbf{2}})\\ p(\mathbf{t+}\boldsymbol{\delta}) & \approx p(\mathbf{t+}\frac{\boldsymbol{\delta}}{\mathbf{2}})-\frac{\delta}{2}\times\frac{\partial U}{\partial q}(q(\mathbf{t+}\boldsymbol{\delta})) \end{align*}\]

Leapfrogstep = function(epsilon, L, current_q, current_p) {
  Q = current_q
  P = current_p
  for (i in 1:L) {
    current_p = current_p - 1 * epsilon *current_q / 2
    current_q = current_q + 1 * current_p * epsilon
    current_p = current_p - 1 * epsilon * current_q / 2
    Q = c(Q, current_q)
    P = c(P, current_p)
  }
  return(list(P = P, Q = Q))
}
plot(cos(angle), sin(angle), lwd = 4, type = 'l', lty = 3,
     xlab = "position q", ylab = "position q", xlim = c(-2, 2), ylim = c(-2, 2)
     )
xyfrog = Leapfrogstep(epsilon = .3, L = 20, current_q = 0, current_p = 1)
lines(xyfrog$Q, xyfrog$P, type = "b", pch = 19, col = 'red', cex = 0.75)

But the discretization step must be chosen adequately:

plot(cos(angle), sin(angle), lwd = 4, type = 'l',lty = 3, las = 1,
     xlab = "position q", ylab = "position q", xlim = c(-2, 2), ylim = c(-2, 2)
     )
xyfrog2 = Leapfrogstep(epsilon = 1.2, L = 20, current_q = 0, current_p = 1)
lines(xyfrog2$Q, xyfrog2$P, typ = "b", pch = 19, col = 'red')

From Maths to Stats : Hamiltonian MCMC

Hamiltonian MC : Canonical (Gibbs) distributions

Energy function \(H(z)\longmapsto [z] =\frac{1}{\mathrm{Cst}}\exp(-\frac{H(z)}{T})\)
Canonical distribution \([z]\longmapsto H(z)=-\log([z])-\log(\mathrm{Cst})\) (with temperature set to \(T=1)\)

\(z=(q,p):\) \[\begin{align*} [q,p] & \mapsto H(q,p) \\ H(q,p) & =-\log([q,p])-\log(\mathrm{Cst})\\ & =-\log([q|p])-\log([p])-\log(\mathrm{Cst}) \end{align*}\]

With \(H(q,p)=U(q)+K(p)\) \(\Leftrightarrow q\perp p\) if \[ \log([q,p])=\log([q])+\log([p]) \]

Idea : \(q=\theta\), and \(p\) is an auxilliary variable doubling the dimension of the problem…
Note: Neither \(U(q)\) nor \(\nabla U(q)\) need to know the normalizing \(\mathrm{Cst}\) of \([q]\)

Basics of Hamiltonian MC

Principled framework :
- \(q\in\mathbb{R}^{n}\longmapsto\left( \begin{array}[c]{c} q\\ p \end{array}\right) \in\mathbb{R}^{2\times n}\), \([q]\) targeted unnormalized posterior
- \([p]\) arbitrary, in practice \(q\perp p\), and \([p]\propto\exp(-\frac{1}{2}p^{\prime}M^{-1}p)\)
- \(H(q,p)=-\log([q])-\log([p])\)
Perform numerical integration along the Hamiltonian trajectories: \((q,p)\rightrightarrows(q^{\ast},p^{\ast})\) by leapfrog(\(L,\delta\)) +negating momentum at the last step.
MH-step with acceptance rate \(r_{accept}\) to correct leapfrog since it only approximatively provides \(H-\)invariant trajectories of \(\mathcal{T}\) \[\begin{align*} r_{accept} & =\min(1,\exp(\left\{ -H(q^{\ast},p^{\ast})+H(q,p)\right\} )\\ & =\min(1,\exp(\left\{ -U(q^{\ast})+U(q)-K(p^{\ast})+K(q)\right\} ) \end{align*}\]
Reversibility of leapfrog steps+ Volume preservation + Detailed balance “property \(\Longrightarrow\exp(-H(q,p))\times dp \times dq\) left invariant.
Number of steps \(L\) + discretization \(\delta\) as (dependent) tuning parameters.

Implementation of HMC

The R code hereafter relies on a basic implementation of Hamilton Monte Carlo given by Radford M. Neal in Figure 2 of “MCMC using Hamiltonian dynamics”, in the Handbook of Markov Chain Monte Carlo (2010).

The arguments to the HMC function are as follows:

\(\texttt{U}\): A function to evaluate minus the log of the density of the distribution to be sampled, plus any constant, ie, the potential energy.
\(\texttt{grad_U}\) : A function to evaluate the gradient of \(\texttt{U}\).
\(\texttt{epsilon}\) : The stepsize to use for the leapfrog steps.
\(\texttt{L}\) : The number of leapfrog steps to do to propose a new state.
\(\texttt{current_q}\) The current state (position variables only).

Momentum variables \(p\) are sampled from independent standard normal distributions within this function. The value return is the vector of new position variables (equal to \(\texttt{current_q}\) if the endpoint of the trajectory was rejected).

Various illustrative variants of HMC are available from Neal’s web page

HMC = function (U, grad_U, epsilon, L, current_q)
{
  q = current_q
  p = rnorm(length(q), 0, 1)  # independent standard normal variates
  current_p = p

  # Make a half step for momentum at the beginning

  p = p - epsilon * grad_U(q) / 2

  # Alternate full steps for position and momentum

  for (i in 1:L)
  {
    # Make a full step for the position

    q = q + epsilon * p

    # Make a full step for the momentum, except at end of trajectory

    if (i != L) p = p - epsilon * grad_U(q)
  }

  # Make a half step for momentum at the end.

  p = p - epsilon * grad_U(q) / 2

  # Negate momentum at end of trajectory to make the proposal symmetric

  p = -p

  # Evaluate potential and kinetic energies at start and end of trajectory

  current_U = U(current_q)
  current_K = sum(current_p^2) / 2
  proposed_U = U(q)
  proposed_K = sum(p^2) / 2

  # Accept or reject the state at end of trajectory, returning either
  # the position at the end of the trajectory or the initial position

  if (runif(1) < exp(current_U - proposed_U + current_K - proposed_K))
  {
    return (list(q = q, p = p) ) # accept
  }
  else
  {
    return (list(q = current_q, p = p))  # reject
  }
}

##############################
HMCstep = function (U, grad_U, epsilon, L, current_q)
{ # same as previous HMC code but keep track of each step
  Q <- q <- current_q
  p = rnorm(length(q), 0, 1) 
  P <- current_p <- p
  p = p - epsilon * grad_U(q) / 2
  P <- cbind(P, p)
  Q <- cbind(Q, q)
  for (i in 1:L)
  {
  q = q + epsilon * p
    Q <- cbind(Q, q)
  if (i != L) {
    p = p - epsilon * grad_U(q)
    P <- cbind(P, p)
    }
  }
  p = p - epsilon * grad_U(q) / 2
  P <- cbind(P, p)
  p = -p
  P <- cbind(P, p)
  Q <- cbind(Q, q)
  current_U = U(current_q)
  current_K = sum(current_p^2) / 2
  proposed_U = U(q)
  proposed_K = sum(p^2) / 2
  if (runif(1) < exp(current_U - proposed_U + current_K - proposed_K))
  {
    return (list(q = q, p = p, Q = Q, P = P) ) # accept
  }
  else
  {
    return (list(q = current_q, p = p, Q = Q, P = P))  # reject
  }
}

1-D illustration of HMC

Sampling a Student distribution

U = function(q) { 3 * log(1 + (q^2) / 5) }

gradU = function(q) { 3 * (2 * q / 5) / (1 + (q^2) / 5) }
###############################
x <- y <- seq(-5, 5, len = 50)
H <- outer(U(x), 0.5 * y * y, "+")

pq_plot <- function(H, x, y, n_levels = 25, transparency = 0.4, col = "white",
                    xtitre = "position q", ytitre = "momentum p") {
  theme_set(theme_bw(base_size = 16))
  dat <- reshape2::melt(H)
  dat$Var1 <- x[dat$Var1]
  dat$Var2 <- y[dat$Var2]
  g <- ggplot() +
    geom_tile(data = dat,
              aes(x = Var1, y = Var2, fill = value)
              ) +
    scale_fill_gradientn(name = "H", colors = viridisLite::viridis(256)) +
    geom_contour(data = dat,
                 aes(x = Var1, y = Var2, z = value),
                 bins = n_levels, color = col, alpha = transparency
                 ) +
    xlab(xtitre) + ylab(ytitre) +
    theme(legend.position = "top",
          legend.text = element_text(size = 10),
          plot.title = element_text(lineheight = 0.8, face = "bold"),
          axis.text = element_text(size = 12),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          panel.background = element_blank()
          )
  return(g)
}
pq_plot(H = H, x = x, y = y)

##################################STEP by STEP
epsilon = 0.3
L = 15
Q <- current_q <- 4                   
###########################CHANGER pour voir le leapfrog
lstep = HMCstep(U, gradU, epsilon, L, current_q)
pq_plot(H = H, x = x, y = y) +
  geom_path(data = data.frame(x = drop(lstep$Q), y = drop(lstep$P)),
            aes(x = x, y = y), color = "white"
            ) +
  geom_point(data = data.frame(x = drop(lstep$Q), y = drop(lstep$P)),
             aes(x = x, y = y), color = "white"
             ) +
  coord_cartesian(xlim = range(x), ylim = range(y))

##################################One GO
epsilon = 0.3
L = 10
Q <- current_q <- 0
p <- q <- NULL

for (n in 1:1000) {
  l = HMC(U, gradU, epsilon, L, current_q)
  current_q = l$q
  Q <- c(Q, current_q)
  p <- c(p, l$p)
  q <- c(q, l$q)
  # points(l$q, l$p, pch = 19, cex = 0.5, col = 'white')
}; rm(n)

pq_plot(H = H, x = x, y = y) +
  geom_point(data = data.frame(x = q, y = p),
             aes(x = x, y = y), col = "white", alpha = 0.3
             ) +
  coord_cartesian(xlim = range(x), ylim = range(y))

hist(Q, freq = F, nc = 25)
lines(x, dt(x, 5), col = "red")

#####################################

Sampling a bimodal univariate distribution

rm(p, q, Q, H, L, epsilon, current_q, l, lstep, U, gradU)
#######################################
U = function(q) { -log(0.6 * exp(-0.5 * (q+2)^2) + 0.4 * exp(-0.5 * (q-2)^2)) }

gradU = function(q) {
  (0.6 * (q + 2) * exp(-0.5 * (q + 2)^2) +
     0.4 * (q - 2) * exp(-0.5 * (q - 2)^2)) / exp(-U(q)) 
}

x <- y <- seq(-5, 5, len = 50)
H <- outer(U(x), 0.5 * y * y, "+")
pq_plot(H = H, x = x, y = y)

epsilon = 0.3
L = 15
Q <- current_q <- 4                   
lstep = HMCstep(U, gradU, epsilon, L, current_q)
pq_plot(H = H, x = x, y = y) +
  geom_path(data = data.frame(x = drop(lstep$Q), y = drop(lstep$P)),
            aes(x = x, y = y), color = "white"
            ) +
  geom_point(data = data.frame(x = drop(lstep$Q), y = drop(lstep$P)),
             aes(x = x, y = y), color = "white"
             )

###########################CHANGER pour voir le leapfrog
epsilon = 0.3
L = 10
Q <- current_q <- 0
p <- q <- NULL

for (n in 1:1000) {
  l = HMC(U, gradU, epsilon, L, current_q)
  current_q = l$q
  Q <- c(Q, current_q)
  p <- c(p, l$p)
  q <- c(q, l$q)
}; rm(n)

pq_plot(H = H, x = x, y = y) +
  geom_point(data = data.frame(x = q, y = p),
             aes(x = x, y = y), col = "black", alpha = 0.3
             )

hist(Q, freq = F, nc = 25)
lines(x, 0.6 * dnorm(x, -2) + 0.4 * dnorm(x, 2), col = "red")

2-D illustration of HMC

Sampling a bivariate distribution

rm(p, q, Q, H, L, epsilon, current_q, l, lstep, U, gradU)
############################## Normale 2 dim correlee
rho = 0.85
Preci = matrix(c(1, -rho, -rho, 1), 2, 2) / (1 -rho * rho)
U = function(q) { 0.5 * t(q) %*% Preci %*% q }
Umult = function(x, y) {
  q = matrix(c(x, y), nr = 2, nc = 1)
  return(U(q))
}
gradU = function(q) { Preci %*% q }
###############################
x <- y <- seq(-5, 5, len = 50)
H = matrix(0, nr = length(x), nc = length(y))
for (i in 1:50) { 
  for (j in 1:50) { H[i, j] = Umult(x[i], y[j]) } 
}; rm(i, j)
pq_plot(H = -H, x = x, y = y, xtitre = 'position q1', ytitre = 'position q2')

##################################STEP by STEP
epsilon = 0.3
L = 15
Q <- current_q <- matrix(c(3, 5), nr = 2, nc = 1)                 
###########################CHANGER pour voir le leapfrog
lstep = HMCstep(U, gradU, epsilon, L, current_q)
pq_plot(H = H, x = x, y = y, xtitre = 'position q1', ytitre = 'position q2') +
  geom_path(data = data.frame(x = drop(lstep$Q[1, ]), y = drop(lstep$Q[2, ])),
            aes(x = x, y = y), color = "white"
            ) +
  geom_point(data = data.frame(x = drop(lstep$Q[1, ]), y = drop(lstep$Q[2, ])),
             aes(x = x, y = y), color = "white"
             )

###############################
x <- y <- seq(-5, 5, len = 50)
H = matrix(0, nr = length(x), nc = length(y))
for (i in 1:50) { 
  for (j in 1:50) { H[i, j] = Umult(x[i], y[j]) } 
}; rm(i, j)
pq_plot(H = -H, x = x, y = y, xtitre = 'position q1', ytitre = 'position q2')

##################################Global
epsilon = 0.3 # ca diverge avec 0.4
L = 10
Q <- current_q <- 3 * matrix(c(-4, 2), nr = 2, nc = 1) 
q <- NULL
for (n in 1:10000) {
  l = HMC(U, gradU, epsilon, L, current_q)
  current_q = l$q
  Q <- cbind(Q, current_q)
  q <- rbind(q, t(l$q))
}; rm(n)

pq_plot(H = -H, x = x, y = y, xtitre = 'position q1', ytitre = 'position q2') +
  geom_point(data = data.frame(x = q[, 1], y = q[, 2]),
             aes(x = x, y = y), col = "black", alpha = 0.3
             )

cor(t(Q))

##           [,1]      [,2]
## [1,] 1.0000000 0.8412577
## [2,] 0.8412577 1.0000000

Working with Stan : 1. A bit of theory to understand Hamilton Monte Carlo

Eric Parent & Matthieu Authier

the 19th of April, 2019