1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905

\documentclass{article}
\usepackage{natbib}
\usepackage{graphics}
\usepackage{amsmath}
\usepackage{indentfirst}
\usepackage{url}
\usepackage[utf8]{inputenc}
\DeclareMathOperator{\var}{var}
\DeclareMathOperator{\cov}{cov}
% \VignetteIndexEntry{MCMC Example}
\begin{document}
<<foo,include=FALSE,echo=FALSE>>=
options(keep.source = TRUE, width = 60)
@
\title{MCMC Package Example (Version \Sexpr{packageVersion("mcmc")})}
\author{Charles J. Geyer}
\maketitle
\section{The Problem}
This is an example of using the \verb@mcmc@ package in R. The problem comes
from a takehome question on a (takehome) PhD qualifying exam
(School of Statistics, University of Minnesota).
Simulated data for the problem are in the dataset \verb@logit@.
There are five variables in the data set, the response \verb@y@
and four predictors, \verb@x1@, \verb@x2@, \verb@x3@, and \verb@x4@.
A frequentist analysis for the problem is done by the following R statements
<<frequentist>>=
library(mcmc)
data(logit)
out < glm(y ~ x1 + x2 + x3 + x4, data = logit,
family = binomial, x = TRUE)
summary(out)
@
But this problem isn't about that frequentist analysis, we want a Bayesian
analysis. For our Bayesian analysis we assume the same data model as the
frequentist, and we assume the prior distribution of the five parameters
(the regression coefficients) makes them independent and identically
normally distributed with mean 0 and standard deviation 2.
The log unnormalized posterior density (log likelihood plus log prior)
for this model is calculated by
the following R function. In order to avoid using either global variables
or \verb@...@ arguments, we use the function factory pattern
(\citealp[Section~10.3.1]{wickham};
\citealp[Section~7.5, especially Subsection~7.5.4]{basic};
see also Appendix~\ref{sec:versus} below).
<<log.unnormalized.posterior>>=
lupost_factory < function(x, y) function(beta) {
eta < as.numeric(x %*% beta)
logp < ifelse(eta < 0, eta  log1p(exp(eta)),  log1p(exp( eta)))
logq < ifelse(eta < 0,  log1p(exp(eta)),  eta  log1p(exp( eta)))
logl < sum(logp[y == 1]) + sum(logq[y == 0])
return(logl  sum(beta^2) / 8)
}
lupost < lupost_factory(out$x, out$y)
@
The tricky calculation of the log likelihood avoids overflow and catastrophic
cancellation in calculation of $\log(p)$ and $\log(q)$ where
\begin{align*}
p & = \frac{\exp(\eta)}{1 + \exp(\eta)} = \frac{1}{1 + \exp( \eta)}
\\
q & = \frac{1}{1 + \exp(\eta)} = \frac{\exp( \eta)}{1 + \exp( \eta)}
\end{align*}
so taking logs gives
\begin{align*}
\log(p) & = \eta  \log(1 + \exp(\eta)) =  \log(1 + \exp( \eta))
\\
\log(q) & =  \log(1 + \exp(\eta)) =  \eta  \log(1 + \exp( \eta))
\end{align*}
To avoid overflow, we always chose the case where the argument of $\exp$
is negative. We have also avoided catastrophic cancellation when
$\lvert\eta\rvert$ is large. If $\eta$ is large and positive, then
\begin{align*}
p & \approx 1
\\
q & \approx 0
\\
\log(p) & \approx  \exp( \eta)
\\
\log(q) & \approx  \eta  \exp( \eta)
\end{align*}
and our use of the R function \texttt{log1p}, which calculates the
function $x \mapsto \log(1 + x)$
correctly for small $x$ avoids all problems. The case where $\eta$ is large
and negative is similar.
\section{Beginning MCMC}
With those definitions in place, the following code runs the Metropolis
algorithm to simulate the posterior.
<<metropolistry1>>=
set.seed(42) # to get reproducible results
beta.init < as.numeric(coefficients(out))
out < metrop(lupost, beta.init, 1e3)
names(out)
out$accept
@
The arguments to the \verb@metrop@ function used here (there are others
we don't use) are
\begin{itemize}
\item an R function (here \verb@lupost@) that evaluates the log unnormalized
density of the desired stationary distribution of the Markov chain
(here a posterior distribution). Note that (although this example
does not exhibit the phenomenon) that the unnormalized density may
be zero, in which case the log unnormalized density is \verb@Inf@.
\item an initial state (here \verb@beta.init@) of the Markov chain.
\item a number of batches (here \verb@1e3@) for the Markov chain.
This combines with batch length and spacing (both 1 by default)
to determine the number of iterations done.
\item additional arguments (here \verb@x@ and \verb@y@) supplied to
provided functions (here \verb@lupost@).
\item there is no ``burnin'' argument, although burnin is easily
accomplished, if desired (more on this below).
\end{itemize}
The output is in the component \verb@out$batch@ returned by the \verb@metrop@
function. We'll look at it presently, but first we need to adjust the
proposal to get a higher acceptance rate (\verb@out$accept@). It is generally
accepted \citep*{grg} that an acceptance rate of about 20\% is right, although
this recommendation is based on the asymptotic analysis of a toy problem
(simulating a multivariate normal distribution) for which one would never
use MCMC and is very unrepresentative of difficult MCMC applications.
\citet{geyertemp} came to a similar conclusion,
that a 20\% acceptance rate is about right, in a very different situation.
But they also warned that a 20\% acceptance rate could be very wrong
and produced
an example where a 20\% acceptance rate was impossible and attempting to
reduce the acceptance rate below 70\% would keep the sampler from ever
visiting part of the state space. So the 20\% magic number must be
considered like other rules of thumb we teach in intro courses
(like $n > 30$ means means normal approximation is valid).
We know these rules of thumb can fail.
There are examples in the literature where
they do fail. We keep repeating them because we want something simple to
tell beginners, and they are all right for some problems.
Be that as it may, we try for 20\%.
<<metropolistry2>>=
out < metrop(out, scale = 0.1)
out$accept
out < metrop(out, scale = 0.3)
out$accept
out < metrop(out, scale = 0.5)
out$accept
out < metrop(out, scale = 0.4)
out$accept
@
Here the first argument to each instance of the \verb@metrop@ function is
the output of a previous invocation. The Markov chain continues where
the previous run stopped, doing just what it would have done if it had
kept going, the initial state and random seed being the final state and
final random seed of the previous invocation. Everything stays the same
except for the arguments supplied (here \verb@scale@).
\begin{itemize}
\item The argument \verb@scale@ controls the size of the Metropolis
``normal random walk'' proposal. The default is \verb@scale = 1@.
Big steps give lower acceptance rates. Small steps give higher.
We want something about 20\%. It is also possible to make \verb@scale@
a vector or a matrix. See \verb@help(metrop)@.
\end{itemize}
Because each run starts where the last one stopped (when the first argument
to \verb@metrop@ is the output of the previous invocation), each run serves
as ``burnin'' for its successor (assuming that any part of that run was
worth anything at all).
\section{Diagnostics}
O.~K. That does it for the acceptance rate. So let's do a longer run
and look at the results.
<<label=metropolistry3>>=
out < metrop(out, nbatch = 1e4)
t.test(out$accept.batch)$conf.int
out$time
@
Here we do a Monte Carlo confidence interval
for the true unknown acceptance rate
(what we would see with an infinite Monte Carlo sample size).
Figure~\ref{fig:fig1} (page~\pageref{fig:fig1})
shows the time series plot made by the R statement
<<label=fig1too,include=FALSE>>=
plot(ts(out$batch))
@
\begin{figure}
\begin{center}
<<label=fig1,fig=TRUE,echo=FALSE>>=
<<fig1too>>
@
\end{center}
\caption{Time series plot of MCMC output.}
\label{fig:fig1}
\end{figure}
Another way to look at the output is an autocorrelation plot.
Figure~\ref{fig:fig2} (page~\pageref{fig:fig2})
shows the time series plot made by the R statement
<<label=fig2too,include=FALSE>>=
acf(out$batch)
@
\begin{figure}
\begin{center}
<<label=fig2,fig=TRUE,echo=FALSE>>=
<<fig2too>>
@
\end{center}
\caption{Autocorrelation plot of MCMC output.}
\label{fig:fig2}
\end{figure}
As with any multiplot plot, these are a bit hard to read. Readers are
invited to make the separate plots to get a better picture.
As with all ``diagnostic'' plots in MCMC, these don't ``diagnose''
subtle problems.
\begin{quotation}
The purpose of regression diagnostics is to find obvious, gross,
embarrassing problems that jump out of simple plots \citep{bogosity}.
\end{quotation}
The time series plots will show \emph{obvious} nonstationarity.
They will not show \emph{nonobvious} nonstationarity. They
provide no guarantee whatsoever that your Markov chain is sampling
anything remotely resembling the correct stationary distribution
(with log unnormalized density \verb@lupost@). In this very easy
problem, we do not expect any convergence difficulties and so believe
what the diagnostics seem to show, but one is a fool to trust such
diagnostics in difficult problems.
The autocorrelation plots seem to show that the
the autocorrelations are negligible after about lag 25.
This diagnostic inference is reliable if the sampler is actually
working (has nearly reached equilibrium) and worthless otherwise.
Thus batches of length 25 should be sufficient, but let's use
length 100 to be safe.
A more judicious discussion of asymptotics is found
in \citet[Section~1.11.5]{intro}, which says
\begin{quotation}
Many [diagnostics] come with theorems, but the theorems never prove
the property you really want a diagnostic to have. These theorems say
that if the chain converges, then the diagnostic will probably say that
the chain converged, but they do not say that if the chain pseudoconverges,
then the diagnostic will probably say that the chain did not converge.
\end{quotation}
\section{Monte Carlo Estimates and Standard Errors}
<<label=metropolistry4>>=
out < metrop(out, nbatch = 100, blen = 100,
outfun = function(z) c(z, z^2))
t.test(out$accept.batch)$conf.int
out$time
@
We have added an argument \verb@outfun@ that gives the functional
of the Markov chain \citep[Section~1.6]{intro} we want to average.
For this problem we are interested
in both posterior mean and variance. Mean is easy, just average the
variables in question. But variance is a little tricky. We need to
use the identity
$$
\var(X) = E(X^2)  E(X)^2
$$
to write variance as a function of two things that can be estimated
by simple averages. Hence we want to average the state itself and
the squares of each component. Hence our \verb@outfun@ returns
\verb@c(z, z^2)@ for an argument (the state vector) \verb@z@.
\subsection{Simple Means}
The grand means (means of batch means) are
<<label=metropolisbatch>>=
apply(out$batch, 2, mean)
@
The first 5 numbers are the Monte Carlo estimates of the posterior means.
The second 5 numbers are the Monte Carlo estimates of the posterior
ordinary second moments. We get the posterior variances by
<<label=metropolisbatchtoo>>=
foo < apply(out$batch, 2, mean)
mu < foo[1:5]
sigmasq < foo[6:10]  mu^2
mu
sigmasq
@
Monte Carlo standard errors (MCSE) are calculated from the batch means.
This is simplest for the means.
<<label=metropolismcsemu>>=
mu.mcse < apply(out$batch[ , 1:5], 2, sd) / sqrt(out$nbatch)
mu.mcse
@
The extra factor \verb@sqrt(out$nbatch)@ arises because the batch means
have variance $\sigma^2 / b$ where $b$ is the batch length, which is
\verb@out$blen@,
whereas the overall means \verb@mu@ have variance $\sigma^2 / n$ where
$n$ is the total number of iterations, which is \verb@out$blen * out$nbatch@.
\subsection{Functions of Means}
To get the MCSE for the posterior variances we apply the delta method.
Let $u_i$ denote the sequence of batch means of the first kind for one
parameter and $\bar{u}$ the grand mean (the estimate of the posterior mean
of that parameter),
let $v_i$ denote the sequence of batch means of the second kind for the
same parameter and $\bar{v}$ the grand mean (the estimate of the posterior
second absolute moment of that parameter), and let $\mu = E(\bar{u})$ and
$\nu = E(\bar{v})$. Then the delta method linearizes the nonlinear function
$$
g(\mu, \nu) = \nu  \mu^2
$$
as
$$
\Delta g(\mu, \nu) = \Delta \nu  2 \mu \Delta \mu
$$
saying that
$$
g(\bar{u}, \bar{v})  g(\mu, \nu)
$$
has the same asymptotic normal distribution as
$$
(\bar{v}  \nu)  2 \mu (\bar{u}  \mu)
$$
which, of course, has variance \verb@1 / nbatch@ times that of
$$
(v_i  \nu)  2 \mu (u_i  \mu)
$$
and this variance is estimated by
$$
\frac{1}{n_{\text{batch}}} \sum_{i = 1}^{n_{\text{batch}}}
\bigl[ (v_i  \bar{v})  2 \bar{u} (u_i  \bar{u}) \bigr]^2
$$
So
<<label=metropolismcsesigmasq>>=
u < out$batch[ , 1:5]
v < out$batch[ , 6:10]
ubar < apply(u, 2, mean)
vbar < apply(v, 2, mean)
deltau < sweep(u, 2, ubar)
deltav < sweep(v, 2, vbar)
foo < sweep(deltau, 2, ubar, "*")
sigmasq.mcse < sqrt(apply((deltav  2 * foo)^2, 2, mean) / out$nbatch)
sigmasq.mcse
@
does the MCSE for the posterior variance.
Let's just check that this complicated \verb@sweep@ and \verb@apply@ stuff
does do the right thing.
<<label=metropolismcsesigmasqtoo>>=
sqrt(mean(((v[ , 2]  vbar[2])  2 * ubar[2] * (u[ , 2]  ubar[2]))^2) /
out$nbatch)
@
\paragraph{Comment} Through version 0.5 of this vignette it contained
an incorrect procedure for calculating this MCSE, justified by a handwave
(which was incorrect).
Essentially, it said to use the standard deviation of the batch means called
\verb@v@ here, which appears to be very conservative.
\subsection{Functions of Functions of Means}
If we are also interested in the posterior standard deviation
(a natural question, although not asked on the exam problem),
the delta method gives its standard error in terms of that
for the variance
<<label=metropolismcsesigma>>=
sigma < sqrt(sigmasq)
sigma.mcse < sigmasq.mcse / (2 * sigma)
sigma
sigma.mcse
@
\section{A Final Run}
So that's it. The only thing left to do is a little more precision
(the exam problem directed ``use a long enough run of your Markov chain
sampler so that the MCSE are less than 0.01'')
<<label=metropolistry5>>=
out < metrop(out, nbatch = 500, blen = 400)
t.test(out$accept.batch)$conf.int
out$time
<<metropolisbatchtoo>>
<<metropolismcsemu>>
<<metropolismcsesigmasq>>
<<metropolismcsesigma>>
@
and some nicer output, which is presented in three tables
constructed from the R variables defined above
using the R \verb@xtable@ command in the \verb@xtable@ library.
\begin{table}[ht]
\caption{Posterior Means}
\label{tab:mu}
\begin{center}
<<label=tab1,echo=FALSE,results=tex>>=
foo < rbind(mu, mu.mcse)
dimnames(foo) < list(c("estimate", "MCSE"),
c("constant", paste("$x_", 1:4, "$", sep = "")))
library(xtable)
print(xtable(foo, digits = rep(4, 6),
align = c("l", rep("c", 5))), floating = FALSE,
caption.placement = "top",
sanitize.colnames.function = function(x) return(x))
@
\end{center}
\end{table}
\begin{table}[ht]
\caption{Posterior Variances}
\label{tab:sigmasq}
\begin{center}
<<label=tab1,echo=FALSE,results=tex>>=
foo < rbind(sigmasq, sigmasq.mcse)
dimnames(foo) < list(c("estimate", "MCSE"),
c("constant", paste("$x_", 1:4, "$", sep = "")))
library(xtable)
print(xtable(foo, digits = rep(4, 6),
align = c("l", rep("c", 5))), floating = FALSE,
caption.placement = "top",
sanitize.colnames.function = function(x) return(x))
@
\end{center}
\end{table}
\begin{table}[ht]
\caption{Posterior Standard Deviations}
\label{tab:sigma}
\begin{center}
<<label=tab1,echo=FALSE,results=tex>>=
foo < rbind(sigma, sigma.mcse)
dimnames(foo) < list(c("estimate", "MCSE"),
c("constant", paste("$x_", 1:4, "$", sep = "")))
library(xtable)
print(xtable(foo, digits = rep(4, 6),
align = c("l", rep("c", 5))), floating = FALSE,
caption.placement = "top",
sanitize.colnames.function = function(x) return(x))
@
\end{center}
\end{table}
Note for the record that the all the results presented in the tables
are from ``one long run'' where long here took only
<<label=time,echo=FALSE,results=tex>>=
cat(out$time[1], "\n")
@
seconds (on whatever computer it was run on).
\section{New Variance Estimation Functions}
R function \texttt{initseq} (added in version 0.6 of this package)
estimates variances in the Markov chain
central limit theorem (CLT) following the methodology introduced by
\citet[Section~3.3]{practical}. These methods only apply to scalarvalued
functionals of
reversible Markov chains, but the Markov chains produced by the \texttt{metrop}
function satisfy this condition, even, as we shall see below, when batching
is used.
Rather than redo the Markov chains in the preceding material, we just look
at a toy problem, an AR(1) time series, which can be simulated in one line
of R. This is the example on the help page for \texttt{initseq}.
<<x>>=
n < 2e4
rho < 0.99
x < arima.sim(model = list(ar = rho), n = n)
@
The time series \texttt{x} is a reversible Markov chain and trivially
a scalarvalued functional of a Markov chain.
Define
\begin{equation} \label{eq:little}
\gamma_k = \cov(X_i, X_{i + k})
\end{equation}
where the covariances refer to the stationary Markov chain having the
same transition probabilities as \texttt{x}. Then the variance in the CLT
is
$$
\sigma^2 = \gamma_0 + 2 \sum_{k = 1}^\infty \gamma_k
$$
\citep[Theorem~2.1]{practical}, that is,
$$
\bar{x}_n \approx \text{Normal}\left(\mu, \frac{\sigma^2}{n}\right),
$$
where $\mu = E(X_i)$ is the quantity being estimated by MCMC (in this
toy problem we know $\mu = 0$).
Naive estimates of $\sigma^2$ obtained by plugging in empirical
estimates of the gammas do not provide consistent estimation
\citep[Section~3.1]{practical}. Thus the scheme implemented
by the R function \texttt{initseq}. Define
\begin{equation} \label{eq:big}
\Gamma_k = \gamma_{2 k} + \gamma_{2 k + 1}
\end{equation}
\citet[Theorem~3.1]{practical} says that $\Gamma_k$ considered as a function
of $k$ is strictly positive, strictly decreasing, and strictly convex
(provided we are, as stated above, working with a reversible Markov chain).
Thus it makes sense to use estimators that use these properties.
The estimators implemented by the R function \texttt{initseq} and
described by \citet[Section~3.3]{practical} are conservativeconsistent
in the sense of Theorem~3.2 of that section.
Figure~\ref{fig:gamma} (page~\pageref{fig:gamma})
shows the time series plot made by the R statement
<<label=figgamtoo,include=FALSE>>=
out < initseq(x)
plot(seq(along = out$Gamma.pos)  1, out$Gamma.pos,
xlab = "k", ylab = expression(Gamma[k]), type = "l")
lines(seq(along = out$Gamma.dec)  1, out$Gamma.dec, lty = "dotted")
lines(seq(along = out$Gamma.con)  1, out$Gamma.con, lty = "dashed")
@
\begin{figure}
\begin{center}
<<label=figgam,fig=TRUE,echo=FALSE>>=
<<figgamtoo>>
@
\end{center}
\caption{Plot ``Big Gamma'' defined by \eqref{eq:little} and \eqref{eq:big}.
Solid line, initial positive sequence estimator.
Dotted line, initial monotone sequence estimator.
Dashed line, initial convex sequence estimator.}
\label{fig:gamma}
\end{figure}
One can use whichever curve one chooses, but now that
the \texttt{initseq} function makes the computation trivial, it makes
sense to use the initial convex sequence.
Of course, one is not interested in Figure~\ref{fig:gamma}, except
perhaps when explaining the methodology. What is actually important
is the estimate of $\sigma^2$, which is given by
<<assvar>>=
out$var.con
(1 + rho) / (1  rho) * 1 / (1  rho^2)
@
where for comparison we have given the exact theoretical value of $\sigma^2$,
which, of course, is never available in a nontoy problem.
These initial sequence estimators seem, at first sight to be a competitor
for the method of batch means. However, appearances can be deceiving.
The two methods are complementary. The sequence of batch means is itself
a scalarvalued functional of a reversible Markov chain. Hence the
initial sequence estimators can be applied to it.
<<batx>>=
blen < 5
x.batch < apply(matrix(x, nrow = blen), 2, mean)
bout < initseq(x.batch)
@
Because the batch length is too short, the variance of the batch means
does not estimate $\sigma^2$. We must account for the autocorrelation
of the batches, shown in Figure~\ref{fig:gambat}.
<<label=figgambattoo,include=FALSE>>=
plot(seq(along = bout$Gamma.con)  1, bout$Gamma.con,
xlab = "k", ylab = expression(Gamma[k]), type = "l")
@
\begin{figure}
\begin{center}
<<label=figgambat,fig=TRUE,echo=FALSE>>=
<<figgambattoo>>
@
\end{center}
\caption{Plot ``Big Gamma'' defined by \eqref{eq:little} and \eqref{eq:big}
for the sequence of batch means (batch length \Sexpr{blen}).
Only initial convex sequence estimator is shown.}
\label{fig:gambat}
\end{figure}
Because the the variance is proportional to one over the batch length,
we need to multiply by the batch length to estimate the $\sigma^2$
for the original series.
<<compvar>>=
out$var.con
bout$var.con * blen
@
Another way to look at this is that the MCMC estimator of $\mu$ is
either \texttt{mean(x)} or \texttt{mean(x.batch)}. And the variance
must be divided by the sample size to give standard errors. So either
<<cicon>>=
mean(x) + c(1, 1) * qnorm(0.975) * sqrt(out$var.con / length(x))
mean(x.batch) + c(1, 1) * qnorm(0.975) * sqrt(bout$var.con / length(x.batch))
@
is an asymptotic 95\% confidence interval for $\mu$. Just divide by
the relevant sample size.
\appendix
\section{Dotdotdot Versus Global Variables Versus Closures}
\label{sec:versus}
This appendix deals with three ways to pass information to a function
being passed to another R function (a higherorder function), for example when
\begin{itemize}
\item the function being passed is the objective function for an optimization
done by the higherorder function, which optimizes
(R function \texttt{optim} for example),
\item the function being passed is the integrand for an integration
done by the higherorder function, which integrates
(R function \texttt{integrate} for example),
\item the function being passed is the estimator of a parameter for a
bootstrap done by the higherorder function, which simulates
the (bootstrap approximation) of the sampling distribution of the
estimator (R function \texttt{boot} in R package \texttt{boot},
for example),
\item the function being passed is the log unnormalized density function
for a simulation
done by the higherorder function, which simulates the distribution
having that unnormalized density
(R function \texttt{metrop} in this package for example),
\end{itemize}
These ways are
\begin{itemize}
\item using dotdotdot (R syntax \verb@...@),
\item using global variables,
\item using closures, also called the function factory pattern.
\end{itemize}
The main body of this vignette uses the third option. This appendix
explains them all and the virtues and vices of each.
\subsection{Dotdotdot}
The dotdotdot mechanism is fairly easy to use when only one function is passed
to the higherorder function, but does require care and more work to use.
It is even harder to deal with when more than one function is passed to
the higherorder function. R functions \texttt{metrop} and \texttt{temper} in
this package can be passed two functions: the log unnormalized density
function and the output function.
\subsection{Only One Function Argument}
Previous versions of this vignette used the dotdotdot mechanism everywhere.
In those versions the log unnormalized density function was defined by
<<log.unnormalized.posteriordotdotdot>>=
lupost < function(beta, x, y) {
eta < as.numeric(x %*% beta)
logp < ifelse(eta < 0, eta  log1p(exp(eta)),  log1p(exp( eta)))
logq < ifelse(eta < 0,  log1p(exp(eta)),  eta  log1p(exp( eta)))
logl < sum(logp[y == 1]) + sum(logq[y == 0])
return(logl  sum(beta^2) / 8)
}
@
rather than the way it is now done in the main text above. Note that
everything is the same in the two definitions except here we have
\begin{verbatim}
lupost < function(beta, x, y) {
\end{verbatim}
where in the main text we now have
\begin{verbatim}
lupost_factory < function(x, y) function(beta) {
\end{verbatim}
and then we have to execute the function factory \verb@lupost_factory@
to make the \texttt{lupost} function.
But the main difference is that R function \texttt{lupost}
(the user written function specifying the log unnormalized posterior density)
\begin{itemize}
\item here has 3 arguments, \texttt{beta}, \texttt{x},
and \texttt{y}, and the latter two must be passed via the dotdotdot
mechanism, but
\item there has 1 argument, \texttt{beta}, and it
just knows about \texttt{x} and \texttt{y}  they are in its closure.
\end{itemize}
So to use this \texttt{lupost} function, we have to add arguments
\texttt{x} and \texttt{y} to each call to R function \texttt{metrop}.
For example
<<metropolistrydotdotdot>>=
out < glm(y ~ x1 + x2 + x3 + x4, data = logit,
family = binomial, x = TRUE)
x < out$x
y < out$y
out < metrop(lupost, beta.init, 1e3, x = x, y = y)
out$accept
out < metrop(out, scale = 0.1, x = x, y = y)
out$accept
out < metrop(out, scale = 0.3, x = x, y = y)
out$accept
out < metrop(out, scale = 0.5, x = x, y = y)
out$accept
out < metrop(out, scale = 0.4, x = x, y = y)
out$accept
@
and so forth.
This method has the benefit that we do not have to explain function factories
and has the drawback that we need to keep remembering to add
\verb@x = x, y = y@ to each invocation of R function \texttt{metrop}.
It is unclear to me which pattern is more mysterious to naive users.
\subsection{More Than One Function Argument}
The situation becomes more complicated (see the Warning section of the help
pages for R functions \texttt{metrop} and \texttt{temper}) when more than
one function argument is passed to the higherorder function. Then
\emph{they all must handle the \emph{same} dotdotdot arguments} whether
or not they want them.
So now we must define the output function as
<<outfundotdotdot>>=
outfun < function(z, ...) c(z, z^2)
@
The \verb@...@ argument in the function signature is essential because
this function is going to be passed dotdotdot arguments \texttt{x}
and \texttt{y}, which it does not need and does not want, so it has to
allow for them (and then not use them).
Then we can continue
<<outfuntrydotdotdot>>=
out < metrop(out, nbatch = 100, blen = 100, outfun = outfun,
x = x, y = y)
out$accept
@
and this works. But we would have gotten an error about unused arguments
if we had defined the output function without the dotdotdot as we did
in the main text.
Your humble author was himself confused about this for years and so doubts
that naive R users will find it intuitive.
\subsection{Global Variables}
As every programmer knows global variables are easy to use and evil
\citep{c2}. They are part of the way of R (or perhaps part of one of
the ways of R). They are OK for ``very small or oneoff programs'' \citep{c2}.
If you are going to throw the code away after one use, fine.
If you are going to give your code to other people or even use it
yourself six months later (after you have long forgotten the details),
then this method is evil and should be avoided.
In this method we define both functions passed to the higherorder function
without \verb@...@ and without using a function factory. We already
in the preceding section of this appendix defined R objects \texttt{x} and
\texttt{y} as global variables (in the R global environment \verb@.GlobalEnv@).
They are global variables and we use them as such, defining
<<functionsglobal>>=
lupost < function(beta) {
eta < as.numeric(x %*% beta)
logp < ifelse(eta < 0, eta  log1p(exp(eta)),  log1p(exp( eta)))
logq < ifelse(eta < 0,  log1p(exp(eta)),  eta  log1p(exp( eta)))
logl < sum(logp[y == 1]) + sum(logq[y == 0])
return(logl  sum(beta^2) / 8)
}
outfun < function(z) c(z, z^2)
@
Then the following works
<<doitglobal>>=
out < metrop(lupost, beta.init, 1e3)
out$accept
out < metrop(out, scale = 0.1)
out$accept
out < metrop(out, scale = 0.3)
out$accept
out < metrop(out, scale = 0.5)
out$accept
out < metrop(out, scale = 0.4)
out$accept
out < metrop(out, nbatch = 100, blen = 100, outfun = outfun)
out$accept
@
We get the best of both worlds. We don't need \verb@x = x, y = y@
and we don't need a function factory.
But if we change the name of the global variables to say
\texttt{modmat} and \texttt{resp} instead of \texttt{x} and \texttt{y},
then our code breaks. R function \texttt{lupost} is looking up
global variables \texttt{x} and \texttt{y} \emph{under those names}
not under any other names. So your code using this method is rigid and brittle
and probably unusable by others.
Note that if we are using either of the other methods, renaming is no problem.
Using dotdotdot we do
\begin{verbatim}
out < metrop(out, scale = 0.1, x = modmat, y = resp)
\end{verbatim}
Using the function factory we do
\begin{verbatim}
lupost < lupost_factory(modmat, resp)
\end{verbatim}
See Section~7.5.3 of \citet{basic} for more about this.
\subsection{Function Factory}
The terminology ``function factory pattern'' is apparently due to
\citet[Section~10.3.1]{wickham}. But it is just a special case about
how closures work (in R and all other languages that have them).
Compare
<<fred>>=
fred < function(y) function(x) x + y
fred(2)(3)
@
\citep[Section~6.5]{basic}) to
<<currying>>=
lupost_factory < function(x, y) function(beta) {
eta < as.numeric(x %*% beta)
logp < ifelse(eta < 0, eta  log1p(exp(eta)),  log1p(exp( eta)))
logq < ifelse(eta < 0,  log1p(exp(eta)),  eta  log1p(exp( eta)))
logl < sum(logp[y == 1]) + sum(logq[y == 0])
return(logl  sum(beta^2) / 8)
}
lupost < lupost_factory(x, y)
lupost(beta.init)
@
We could also do the same calculation treating \verb@lupost_factory@
as just an ordinary curried function, like R function \texttt{fred}
in the preceding example,
<<curryingtoo>>=
lupost_factory(x, y)(beta.init)
@
So there is nothing mysterious about function factories. They are just
one particular use of
the essence of functional programming (closures).
If you really understand functional programming, then you must understand this.
Long ago, when I switched from S to R,
I would not have considered that being an ``knowledgeable user'' of R
required knowledge of closures. Now I do. This is partly because other
functional languages like Scheme, Javascript, F\#, Clojure, and Haskell
have become
very popular for general computer programming. So now we want to emphasize
that we can do in R what we can do in them.
But it is also partly because I now understand more about functional
programming. I can now see that if you really understand closures,
then you don't need most other features of these programming languages
(including R). Following \citet{crockford} we can say that all programming
languages (including R) have their good parts and their bad parts and we
should use the good features and avoid the bad features. The best feature
is closures. Crockford calls this the best idea ever put in a programming
language. So we should use it a lot.
If I were starting to write the \texttt{mcmc} package today, I might leave
out the dotdotdot arguments of R functions \texttt{metrop},
\texttt{morph.metrop}, and \texttt{temper}. That would mean that
users could not use the dotdotdot pattern. They would be forced to
use the function factory pattern or the global variables pattern.
And if they had accepted that the global variables pattern is evil,
then they would have to use the function factory pattern.
(If you like the dotdotdot pattern, don't worry. It is not going
to be removed from this package. Backward compatibility trumps everything.)
\begin{thebibliography}{}
\bibitem[C2 Wiki(2013)]{c2}
C2 Wiki Contributors (2013).
\newblock Global Variables Are Bad.
\newblock \url{http://wiki.c2.com/?GlobalVariablesAreBad}.
\bibitem[Crockford(2008)]{crockford}
Crockford, D. (2008).
\newblock \emph{JavaScript: The Good Parts}.
\newblock O'Reilly, Sebastopol CA.
\bibitem[Gelman et al.(1996)Gelman, Roberts, and Gilks]{grg}
Gelman, A., G.~O. Roberts, and W.~R. Gilks (1996).
\newblock Efficient Metropolis jumping rules.
\newblock In \emph{Bayesian Statistics, 5 (Alicante, 1994)}, pp.~599607.
Oxford University Press.
\bibitem[Geyer(1992)]{practical}
Geyer, C.~J. (1992).
\newblock Practical Markov chain Monte Carlo (with discussion).
\newblock \emph{Statistical Science}, 7, 473511.
\bibitem[Geyer(2006)]{bogosity}
Geyer, C. J. (2006).
\newblock On the bogosity of MCMC diagnostics.
\newblock \url{http://users.stat.umn.edu/~geyer/mcmc/diag.html}.
\bibitem[Geyer(2011)]{intro}
Geyer, C. J. (2011).
\newblock Introduction to Markov chain Monte Carlo.
\newblock In \emph{Handbook of Markov Chain Monte Carlo}, edited by
Brooks, S., Gelman, A., Jones, G., and Meng, X.L., pp.~348.
\newblock Chapman \& Hall/CRC, Boca Raton, FL.
\bibitem[Geyer(2018)]{basic}
Geyer, C.~J. (2018).
\newblock Stat 3701 lecture notes: Basics of R.
\newblock \url{http://www.stat.umn.edu/geyer/3701/notes/basic.html}.
\bibitem[Geyer and Thompson(1995)]{geyertemp}
Geyer, C.~J. and E.~A. Thompson (1995).
\newblock Annealing Markov chain Monte Carlo with applications to
ancestral inference.
\newblock \emph{Journal of the American Statistical Association}, 90, 909920.
\bibitem[Wickham(2014)]{wickham}
Wickham, H. (2014).
\newblock \emph{Advanced R}.
\newblock Chapman \& Hall/CRC, Boca Raton, FL.
\newblock Also available on the web \url{http://advr.had.co.nz/}.
\end{thebibliography}
\end{document}
