File: demo.Rnw

package info (click to toggle)
r-cran-mcmc 0.9-7-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 2,352 kB
  • sloc: ansic: 1,298; makefile: 14; sh: 8
file content (905 lines) | stat: -rw-r--r-- 33,848 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905

\documentclass{article}

\usepackage{natbib}
\usepackage{graphics}
\usepackage{amsmath}
\usepackage{indentfirst}
\usepackage{url}
\usepackage[utf8]{inputenc}

\DeclareMathOperator{\var}{var}
\DeclareMathOperator{\cov}{cov}

% \VignetteIndexEntry{MCMC Example}

\begin{document}

<<foo,include=FALSE,echo=FALSE>>=
options(keep.source = TRUE, width = 60)
@

\title{MCMC Package Example (Version \Sexpr{packageVersion("mcmc")})}
\author{Charles J. Geyer}
\maketitle

\section{The Problem}

This is an example of using the \verb@mcmc@ package in R.  The problem comes
from a take-home question on a (take-home) PhD qualifying exam
(School of Statistics, University of Minnesota).

Simulated data for the problem are in the dataset \verb@logit@.
There are five variables in the data set, the response \verb@y@
and four predictors, \verb@x1@, \verb@x2@, \verb@x3@, and \verb@x4@.

A frequentist analysis for the problem is done by the following R statements
<<frequentist>>=
library(mcmc)
data(logit)
out <- glm(y ~ x1 + x2 + x3 + x4, data = logit,
    family = binomial, x = TRUE)
summary(out)
@

But this problem isn't about that frequentist analysis, we want a Bayesian
analysis.  For our Bayesian analysis we assume the same data model as the
frequentist, and we assume the prior distribution of the five parameters
(the regression coefficients) makes them independent and identically
normally distributed with mean 0 and standard deviation 2.

The log unnormalized posterior density (log likelihood plus log prior)
for this model is calculated by
the following R function.  In order to avoid using either global variables
or \verb@...@ arguments, we use the function factory pattern
(\citealp[Section~10.3.1]{wickham};
\citealp[Section~7.5, especially Subsection~7.5.4]{basic};
see also Appendix~\ref{sec:versus} below).
<<log.unnormalized.posterior>>=
lupost_factory <- function(x, y) function(beta) {
    eta <- as.numeric(x %*% beta)
    logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta)))
    logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta)))
    logl <- sum(logp[y == 1]) + sum(logq[y == 0])
    return(logl - sum(beta^2) / 8)
}

lupost <- lupost_factory(out$x, out$y)
@
The tricky calculation of the log likelihood avoids overflow and catastrophic
cancellation in calculation of $\log(p)$ and $\log(q)$ where
\begin{align*}
   p & = \frac{\exp(\eta)}{1 + \exp(\eta)} = \frac{1}{1 + \exp(- \eta)}
   \\
   q & = \frac{1}{1 + \exp(\eta)} = \frac{\exp(- \eta)}{1 + \exp(- \eta)}
\end{align*}
so taking logs gives
\begin{align*}
   \log(p) & = \eta - \log(1 + \exp(\eta)) = - \log(1 + \exp(- \eta))
   \\
   \log(q) & = - \log(1 + \exp(\eta)) = - \eta - \log(1 + \exp(- \eta))
\end{align*}
To avoid overflow, we always chose the case where the argument of $\exp$
is negative.  We have also avoided catastrophic cancellation when
$\lvert\eta\rvert$ is large.  If $\eta$ is large and positive, then
\begin{align*}
   p & \approx 1
   \\
   q & \approx 0
   \\
   \log(p) & \approx - \exp(- \eta)
   \\
   \log(q) & \approx - \eta - \exp(- \eta)
\end{align*}
and our use of the R function \texttt{log1p}, which calculates the
function $x \mapsto \log(1 + x)$
correctly for small $x$ avoids all problems.  The case where $\eta$ is large
and negative is similar.

\section{Beginning MCMC}

With those definitions in place, the following code runs the Metropolis
algorithm to simulate the posterior.
<<metropolis-try-1>>=
set.seed(42)    # to get reproducible results
beta.init <- as.numeric(coefficients(out))

out <- metrop(lupost, beta.init, 1e3)
names(out)
out$accept
@

The arguments to the \verb@metrop@ function used here (there are others
we don't use) are
\begin{itemize}
\item an R function (here \verb@lupost@) that evaluates the log unnormalized
    density of the desired stationary distribution of the Markov chain
    (here a posterior distribution).  Note that (although this example
    does not exhibit the phenomenon) that the unnormalized density may
    be zero, in which case the log unnormalized density is \verb@-Inf@.
\item an initial state (here \verb@beta.init@) of the Markov chain.
\item a number of batches (here \verb@1e3@) for the Markov chain.
    This combines with batch length and spacing (both 1 by default)
    to determine the number of iterations done.
\item additional arguments (here \verb@x@ and \verb@y@) supplied to
    provided functions (here \verb@lupost@).
\item there is no ``burn-in'' argument, although burn-in is easily
    accomplished, if desired (more on this below).
\end{itemize}

The output is in the component \verb@out$batch@ returned by the \verb@metrop@
function.  We'll look at it presently, but first we need to adjust the
proposal to get a higher acceptance rate (\verb@out$accept@).  It is generally
accepted \citep*{grg} that an acceptance rate of about 20\% is right, although
this recommendation is based on the asymptotic analysis of a toy problem
(simulating a multivariate normal distribution) for which one would never
use MCMC and is very unrepresentative of difficult MCMC applications.

\citet{geyer-temp} came to a similar conclusion,
that a 20\% acceptance rate is about right, in a very different situation.
But they also warned that a 20\% acceptance rate could be very wrong
and produced
an example where a 20\% acceptance rate was impossible and attempting to
reduce the acceptance rate below 70\% would keep the sampler from ever
visiting part of the state space.  So the 20\% magic number must be
considered like other rules of thumb we teach in intro courses
(like $n > 30$ means means normal approximation is valid).
We know these rules of thumb can fail.
There are examples in the literature where
they do fail.  We keep repeating them because we want something simple to
tell beginners, and they are all right for some problems.

Be that as it may, we try for 20\%.
<<metropolis-try-2>>=
out <- metrop(out, scale = 0.1)
out$accept
out <- metrop(out, scale = 0.3)
out$accept
out <- metrop(out, scale = 0.5)
out$accept
out <- metrop(out, scale = 0.4)
out$accept
@

Here the first argument to each instance of the \verb@metrop@ function is
the output of a previous invocation.  The Markov chain continues where
the previous run stopped, doing just what it would have done if it had
kept going, the initial state and random seed being the final state and
final random seed of the previous invocation.  Everything stays the same
except for the arguments supplied (here \verb@scale@).
\begin{itemize}
\item The argument \verb@scale@ controls the size of the Metropolis
    ``normal random walk'' proposal.  The default is \verb@scale = 1@.
    Big steps give lower acceptance rates.  Small steps give higher.
    We want something about 20\%.  It is also possible to make \verb@scale@
    a vector or a matrix.  See \verb@help(metrop)@.
\end{itemize}

Because each run starts where the last one stopped (when the first argument
to \verb@metrop@ is the output of the previous invocation), each run serves
as ``burn-in'' for its successor (assuming that any part of that run was
worth anything at all).

\section{Diagnostics}

O.~K.  That does it for the acceptance rate.  So let's do a longer run
and look at the results.
<<label=metropolis-try-3>>=
out <- metrop(out, nbatch = 1e4)
t.test(out$accept.batch)$conf.int
out$time
@
Here we do a Monte Carlo confidence interval
for the true unknown acceptance rate
(what we would see with an infinite Monte Carlo sample size).

Figure~\ref{fig:fig1} (page~\pageref{fig:fig1})
shows the time series plot made by the R statement
<<label=fig1too,include=FALSE>>=
plot(ts(out$batch))
@
\begin{figure}
\begin{center}
<<label=fig1,fig=TRUE,echo=FALSE>>=
<<fig1too>>
@
\end{center}
\caption{Time series plot of MCMC output.}
\label{fig:fig1}
\end{figure}

Another way to look at the output is an autocorrelation plot.
Figure~\ref{fig:fig2} (page~\pageref{fig:fig2})
shows the time series plot made by the R statement
<<label=fig2too,include=FALSE>>=
acf(out$batch)
@
\begin{figure}
\begin{center}
<<label=fig2,fig=TRUE,echo=FALSE>>=
<<fig2too>>
@
\end{center}
\caption{Autocorrelation plot of MCMC output.}
\label{fig:fig2}
\end{figure}

As with any multiplot plot, these are a bit hard to read.  Readers are
invited to make the separate plots to get a better picture.
As with all ``diagnostic'' plots in MCMC, these don't ``diagnose''
subtle problems.
\begin{quotation}
The purpose of regression diagnostics is to find obvious, gross,
embarrassing problems that jump out of simple plots \citep{bogosity}.
\end{quotation}
The time series plots will show \emph{obvious} nonstationarity.
They will not show \emph{nonobvious} nonstationarity.  They
provide no guarantee whatsoever that your Markov chain is sampling
anything remotely resembling the correct stationary distribution
(with log unnormalized density \verb@lupost@).  In this very easy
problem, we do not expect any convergence difficulties and so believe
what the diagnostics seem to show, but one is a fool to trust such
diagnostics in difficult problems.

The autocorrelation plots seem to show that the
the autocorrelations are negligible after about lag 25.
This diagnostic inference is reliable if the sampler is actually
working (has nearly reached equilibrium) and worthless otherwise.
Thus batches of length 25 should be sufficient, but let's use
length 100 to be safe.

A more judicious discussion of asymptotics is found
in \citet[Section~1.11.5]{intro}, which says
\begin{quotation}
Many [diagnostics] come with theorems, but the theorems never prove
the property you really want a diagnostic to have. These theorems say
that if the chain converges, then the diagnostic will probably say that
the chain converged, but they do not say that if the chain pseudo-converges,
then the diagnostic will probably say that the chain did not converge.
\end{quotation}

\section{Monte Carlo Estimates and Standard Errors}

<<label=metropolis-try-4>>=
out <- metrop(out, nbatch = 100, blen = 100,
    outfun = function(z) c(z, z^2))
t.test(out$accept.batch)$conf.int
out$time
@

We have added an argument \verb@outfun@ that gives the functional
of the Markov chain \citep[Section~1.6]{intro} we want to average.
For this problem we are interested
in both posterior mean and variance.  Mean is easy, just average the
variables in question.  But variance is a little tricky.  We need to
use the identity
$$
   \var(X) = E(X^2) - E(X)^2
$$
to write variance as a function of two things that can be estimated
by simple averages.  Hence we want to average the state itself and
the squares of each component.  Hence our \verb@outfun@ returns
\verb@c(z, z^2)@ for an argument (the state vector) \verb@z@.

\subsection{Simple Means}

The grand means (means of batch means) are
<<label=metropolis-batch>>=
apply(out$batch, 2, mean)
@
The first 5 numbers are the Monte Carlo estimates of the posterior means.
The second 5 numbers are the Monte Carlo estimates of the posterior
ordinary second moments.  We get the posterior variances by
<<label=metropolis-batch-too>>=
foo <- apply(out$batch, 2, mean)
mu <- foo[1:5]
sigmasq <- foo[6:10] - mu^2
mu
sigmasq
@

Monte Carlo standard errors (MCSE) are calculated from the batch means.
This is simplest for the means.
<<label=metropolis-mcse-mu>>=
mu.mcse <- apply(out$batch[ , 1:5], 2, sd) / sqrt(out$nbatch)
mu.mcse
@
The extra factor \verb@sqrt(out$nbatch)@ arises because the batch means
have variance $\sigma^2 / b$ where $b$ is the batch length, which is
\verb@out$blen@,
whereas the overall means \verb@mu@ have variance $\sigma^2 / n$ where
$n$ is the total number of iterations, which is \verb@out$blen * out$nbatch@.

\subsection{Functions of Means}

To get the MCSE for the posterior variances we apply the delta method.
Let $u_i$ denote the sequence of batch means of the first kind for one
parameter and $\bar{u}$ the grand mean (the estimate of the posterior mean
of that parameter),
let $v_i$ denote the sequence of batch means of the second kind for the
same parameter and $\bar{v}$ the grand mean (the estimate of the posterior
second absolute moment of that parameter), and let $\mu = E(\bar{u})$ and
$\nu = E(\bar{v})$.  Then the delta method linearizes the nonlinear function
$$
   g(\mu, \nu) = \nu - \mu^2
$$
as
$$
   \Delta g(\mu, \nu) = \Delta \nu - 2 \mu \Delta \mu
$$
saying that
$$
   g(\bar{u}, \bar{v}) - g(\mu, \nu)
$$
has the same asymptotic normal distribution as
$$
   (\bar{v} - \nu) - 2 \mu (\bar{u} - \mu)
$$
which, of course, has variance \verb@1 / nbatch@ times that of
$$
   (v_i - \nu) - 2 \mu (u_i - \mu)
$$
and this variance is estimated by
$$
   \frac{1}{n_{\text{batch}}} \sum_{i = 1}^{n_{\text{batch}}}
   \bigl[ (v_i - \bar{v}) - 2 \bar{u} (u_i - \bar{u}) \bigr]^2
$$
So
<<label=metropolis-mcse-sigmasq>>=
u <- out$batch[ , 1:5]
v <- out$batch[ , 6:10]
ubar <- apply(u, 2, mean)
vbar <- apply(v, 2, mean)
deltau <- sweep(u, 2, ubar)
deltav <- sweep(v, 2, vbar)
foo <- sweep(deltau, 2, ubar, "*")
sigmasq.mcse <- sqrt(apply((deltav - 2 * foo)^2, 2, mean) / out$nbatch)
sigmasq.mcse
@
does the MCSE for the posterior variance.

Let's just check that this complicated \verb@sweep@ and \verb@apply@ stuff
does do the right thing.
<<label=metropolis-mcse-sigmasq-too>>=
sqrt(mean(((v[ , 2] - vbar[2]) - 2 * ubar[2] * (u[ , 2] - ubar[2]))^2) /
    out$nbatch)
@

\paragraph{Comment} Through version 0.5 of this vignette it contained
an incorrect procedure for calculating this MCSE, justified by a handwave
(which was incorrect).
Essentially, it said to use the standard deviation of the batch means called
\verb@v@ here, which appears to be very conservative.

\subsection{Functions of Functions of Means}

If we are also interested in the posterior standard deviation
(a natural question, although not asked on the exam problem),
the delta method gives its standard error in terms of that
for the variance
<<label=metropolis-mcse-sigma>>=
sigma <- sqrt(sigmasq)
sigma.mcse <- sigmasq.mcse / (2 * sigma)
sigma
sigma.mcse
@

\section{A Final Run}

So that's it.  The only thing left to do is a little more precision
(the exam problem directed ``use a long enough run of your Markov chain
sampler so that the MCSE are less than 0.01'')
<<label=metropolis-try-5>>=
out <- metrop(out, nbatch = 500, blen = 400)
t.test(out$accept.batch)$conf.int
out$time
<<metropolis-batch-too>>
<<metropolis-mcse-mu>>
<<metropolis-mcse-sigmasq>>
<<metropolis-mcse-sigma>>
@
and some nicer output, which is presented in three tables
constructed from the R variables defined above
using the R \verb@xtable@ command in the \verb@xtable@ library.
\begin{table}[ht]
\caption{Posterior Means}
\label{tab:mu}
\begin{center}
<<label=tab1,echo=FALSE,results=tex>>=
foo <- rbind(mu, mu.mcse)
dimnames(foo) <- list(c("estimate", "MCSE"),
    c("constant", paste("$x_", 1:4, "$", sep = "")))
library(xtable)
print(xtable(foo, digits = rep(4, 6),
    align = c("l", rep("c", 5))), floating = FALSE,
    caption.placement = "top",
    sanitize.colnames.function = function(x) return(x))
@
\end{center}
\end{table}
\begin{table}[ht]
\caption{Posterior Variances}
\label{tab:sigmasq}
\begin{center}
<<label=tab1,echo=FALSE,results=tex>>=
foo <- rbind(sigmasq, sigmasq.mcse)
dimnames(foo) <- list(c("estimate", "MCSE"),
    c("constant", paste("$x_", 1:4, "$", sep = "")))
library(xtable)
print(xtable(foo, digits = rep(4, 6),
    align = c("l", rep("c", 5))), floating = FALSE,
    caption.placement = "top",
    sanitize.colnames.function = function(x) return(x))
@
\end{center}
\end{table}
\begin{table}[ht]
\caption{Posterior Standard Deviations}
\label{tab:sigma}
\begin{center}
<<label=tab1,echo=FALSE,results=tex>>=
foo <- rbind(sigma, sigma.mcse)
dimnames(foo) <- list(c("estimate", "MCSE"),
    c("constant", paste("$x_", 1:4, "$", sep = "")))
library(xtable)
print(xtable(foo, digits = rep(4, 6),
    align = c("l", rep("c", 5))), floating = FALSE,
    caption.placement = "top",
    sanitize.colnames.function = function(x) return(x))
@
\end{center}
\end{table}

Note for the record that the all the results presented in the tables
are from ``one long run'' where long here took only
<<label=time,echo=FALSE,results=tex>>=
cat(out$time[1], "\n")
@
seconds (on whatever computer it was run on).

\section{New Variance Estimation Functions}

R function \texttt{initseq} (added in version 0.6 of this package)
estimates variances in the Markov chain
central limit theorem (CLT) following the methodology introduced by
\citet[Section~3.3]{practical}.  These methods only apply to scalar-valued
functionals of
reversible Markov chains, but the Markov chains produced by the \texttt{metrop}
function satisfy this condition, even, as we shall see below, when batching
is used.

Rather than redo the Markov chains in the preceding material, we just look
at a toy problem, an AR(1) time series, which can be simulated in one line
of R.  This is the example on the help page for \texttt{initseq}.
<<x>>=
n <- 2e4
rho <- 0.99
x <- arima.sim(model = list(ar = rho), n = n)
@
The time series \texttt{x} is a reversible Markov chain and trivially
a scalar-valued functional of a Markov chain.

Define
\begin{equation} \label{eq:little}
   \gamma_k = \cov(X_i, X_{i + k})
\end{equation}
where the covariances refer to the stationary Markov chain having the
same transition probabilities as \texttt{x}.  Then the variance in the CLT
is
$$
   \sigma^2 = \gamma_0 + 2 \sum_{k = 1}^\infty \gamma_k
$$
\citep[Theorem~2.1]{practical}, that is,
$$
   \bar{x}_n \approx \text{Normal}\left(\mu, \frac{\sigma^2}{n}\right),
$$
where $\mu = E(X_i)$ is the quantity being estimated by MCMC (in this
toy problem we know $\mu = 0$).

Naive estimates of $\sigma^2$ obtained by plugging in empirical
estimates of the gammas do not provide consistent estimation
\citep[Section~3.1]{practical}.  Thus the scheme implemented
by the R function \texttt{initseq}.  Define
\begin{equation} \label{eq:big}
   \Gamma_k = \gamma_{2 k} + \gamma_{2 k + 1}
\end{equation}
\citet[Theorem~3.1]{practical} says that $\Gamma_k$ considered as a function
of $k$ is strictly positive, strictly decreasing, and strictly convex
(provided we are, as stated above, working with a reversible Markov chain).
Thus it makes sense to use estimators that use these properties.
The estimators implemented by the R function \texttt{initseq} and
described by \citet[Section~3.3]{practical} are conservative-consistent
in the sense of Theorem~3.2 of that section.

Figure~\ref{fig:gamma} (page~\pageref{fig:gamma})
shows the time series plot made by the R statement
<<label=figgamtoo,include=FALSE>>=
out <- initseq(x)
plot(seq(along = out$Gamma.pos) - 1, out$Gamma.pos,
        xlab = "k", ylab = expression(Gamma[k]), type = "l")
lines(seq(along = out$Gamma.dec) - 1, out$Gamma.dec, lty = "dotted")
lines(seq(along = out$Gamma.con) - 1, out$Gamma.con, lty = "dashed")
@
\begin{figure}
\begin{center}
<<label=figgam,fig=TRUE,echo=FALSE>>=
<<figgamtoo>>
@
\end{center}
\caption{Plot ``Big Gamma'' defined by \eqref{eq:little} and \eqref{eq:big}.
Solid line, initial positive sequence estimator.
Dotted line, initial monotone sequence estimator.
Dashed line, initial convex sequence estimator.}
\label{fig:gamma}
\end{figure}
One can use whichever curve one chooses, but now that
the \texttt{initseq} function makes the computation trivial, it makes
sense to use the initial convex sequence.

Of course, one is not interested in Figure~\ref{fig:gamma}, except
perhaps when explaining the methodology.  What is actually important
is the estimate of $\sigma^2$, which is given by
<<assvar>>=
out$var.con
(1 + rho) / (1 - rho) * 1 / (1 - rho^2)
@
where for comparison we have given the exact theoretical value of $\sigma^2$,
which, of course, is never available in a non-toy problem.

These initial sequence estimators seem, at first sight to be a competitor
for the method of batch means.  However, appearances can be deceiving.
The two methods are complementary.  The sequence of batch means is itself
a scalar-valued functional of a reversible Markov chain.  Hence the
initial sequence estimators can be applied to it.
<<batx>>=
blen <- 5
x.batch <- apply(matrix(x, nrow = blen), 2, mean)
bout <- initseq(x.batch)
@
Because the batch length is too short, the variance of the batch means
does not estimate $\sigma^2$.  We must account for the autocorrelation
of the batches, shown in Figure~\ref{fig:gambat}.
<<label=figgambattoo,include=FALSE>>=
plot(seq(along = bout$Gamma.con) - 1, bout$Gamma.con,
        xlab = "k", ylab = expression(Gamma[k]), type = "l")
@
\begin{figure}
\begin{center}
<<label=figgambat,fig=TRUE,echo=FALSE>>=
<<figgambattoo>>
@
\end{center}
\caption{Plot ``Big Gamma'' defined by \eqref{eq:little} and \eqref{eq:big}
for the sequence of batch means (batch length \Sexpr{blen}).
Only initial convex sequence estimator is shown.}
\label{fig:gambat}
\end{figure}
Because the the variance is proportional to one over the batch length,
we need to multiply by the batch length to estimate the $\sigma^2$
for the original series.
<<compvar>>=
out$var.con
bout$var.con * blen
@
Another way to look at this is that the MCMC estimator of $\mu$ is
either \texttt{mean(x)} or \texttt{mean(x.batch)}.  And the variance
must be divided by the sample size to give standard errors.  So either
<<ci-con>>=
mean(x) + c(-1, 1) * qnorm(0.975) * sqrt(out$var.con / length(x))
mean(x.batch) + c(-1, 1) * qnorm(0.975) * sqrt(bout$var.con / length(x.batch))
@
is an asymptotic 95\% confidence interval for $\mu$.  Just divide by
the relevant sample size.

\appendix

\section{Dot-dot-dot Versus Global Variables Versus Closures}
\label{sec:versus}

This appendix deals with three ways to pass information to a function
being passed to another R function (a higher-order function), for example when
\begin{itemize}
\item the function being passed is the objective function for an optimization
    done by the higher-order function, which optimizes
    (R function \texttt{optim} for example),
\item the function being passed is the integrand for an integration
    done by the higher-order function, which integrates
    (R function \texttt{integrate} for example),
\item the function being passed is the estimator of a parameter for a
    bootstrap done by the higher-order function, which simulates
    the (bootstrap approximation) of the sampling distribution of the
    estimator (R function \texttt{boot} in R package \texttt{boot},
    for example),
\item the function being passed is the log unnormalized density function
    for a simulation
    done by the higher-order function, which simulates the distribution
    having that unnormalized density
    (R function \texttt{metrop} in this package for example),
\end{itemize}
These ways are
\begin{itemize}
\item using dot-dot-dot (R syntax \verb@...@),
\item using global variables,
\item using closures, also called the function factory pattern.
\end{itemize}
The main body of this vignette uses the third option.  This appendix
explains them all and the virtues and vices of each.

\subsection{Dot-dot-dot}

The dot-dot-dot mechanism is fairly easy to use when only one function is passed
to the higher-order function, but does require care and more work to use.
It is even harder to deal with when more than one function is passed to
the higher-order function.  R functions \texttt{metrop} and \texttt{temper} in
this package can be passed two functions: the log unnormalized density
function and the output function.

\subsection{Only One Function Argument}

Previous versions of this vignette used the dot-dot-dot mechanism everywhere.
In those versions the log unnormalized density function was defined by
<<log.unnormalized.posterior-dot-dot-dot>>=
lupost <- function(beta, x, y) {
     eta <- as.numeric(x %*% beta)
     logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta)))
     logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta)))
     logl <- sum(logp[y == 1]) + sum(logq[y == 0])
     return(logl - sum(beta^2) / 8)
}
@
rather than the way it is now done in the main text above.  Note that
everything is the same in the two definitions except here we have
\begin{verbatim}
lupost <- function(beta, x, y) {
\end{verbatim}
where in the main text we now have
\begin{verbatim}
lupost_factory <- function(x, y) function(beta) {
\end{verbatim}
and then we have to execute the function factory \verb@lupost_factory@
to make the \texttt{lupost} function.

But the main difference is that R function \texttt{lupost}
(the user written function specifying the log unnormalized posterior density)
\begin{itemize}
\item here has 3 arguments, \texttt{beta}, \texttt{x},
    and \texttt{y}, and the latter two must be passed via the dot-dot-dot
    mechanism, but
\item there has 1 argument, \texttt{beta}, and it
    just knows about \texttt{x} and \texttt{y} --- they are in its closure.
\end{itemize}

So to use this \texttt{lupost} function, we have to add arguments
\texttt{x} and \texttt{y} to each call to R function \texttt{metrop}.
For example
<<metropolis-try-dot-dot-dot>>=
out <- glm(y ~ x1 + x2 + x3 + x4, data = logit,
    family = binomial, x = TRUE)
x <- out$x
y <- out$y

out <- metrop(lupost, beta.init, 1e3, x = x, y = y)
out$accept
out <- metrop(out, scale = 0.1, x = x, y = y)
out$accept
out <- metrop(out, scale = 0.3, x = x, y = y)
out$accept
out <- metrop(out, scale = 0.5, x = x, y = y)
out$accept
out <- metrop(out, scale = 0.4, x = x, y = y)
out$accept
@
and so forth.

This method has the benefit that we do not have to explain function factories
and has the drawback that we need to keep remembering to add
\verb@x = x, y = y@ to each invocation of R function \texttt{metrop}.
It is unclear to me which pattern is more mysterious to naive users.

\subsection{More Than One Function Argument}

The situation becomes more complicated (see the Warning section of the help
pages for R functions \texttt{metrop} and \texttt{temper}) when more than
one function argument is passed to the higher-order function.  Then
\emph{they all must handle the \emph{same} dot-dot-dot arguments} whether
or not they want them.

So now we must define the output function as
<<outfun-dot-dot-dot>>=
outfun <- function(z, ...) c(z, z^2)
@
The \verb@...@ argument in the function signature is essential because
this function is going to be passed dot-dot-dot arguments \texttt{x}
and \texttt{y}, which it does not need and does not want, so it has to
allow for them (and then not use them).

Then we can continue
<<outfun-try-dot-dot-dot>>=
out <- metrop(out, nbatch = 100, blen = 100, outfun = outfun,
    x = x, y = y)
out$accept
@
and this works.  But we would have gotten an error about unused arguments
if we had defined the output function without the dot-dot-dot as we did
in the main text.

Your humble author was himself confused about this for years and so doubts
that naive R users will find it intuitive.

\subsection{Global Variables}

As every programmer knows global variables are easy to use and evil
\citep{c2}.  They are part of the way of R (or perhaps part of one of
the ways of R).  They are OK for ``very small or one-off programs'' \citep{c2}.
If you are going to throw the code away after one use, fine.
If you are going to give your code to other people or even use it
yourself six months later (after you have long forgotten the details),
then this method is evil and should be avoided.

In this method we define both functions passed to the higher-order function
without \verb@...@ and without using a function factory.  We already
in the preceding section of this appendix defined R objects \texttt{x} and
\texttt{y} as global variables (in the R global environment \verb@.GlobalEnv@).
They are global variables and we use them as such, defining
<<functions-global>>=
lupost <- function(beta) {
     eta <- as.numeric(x %*% beta)
     logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta)))
     logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta)))
     logl <- sum(logp[y == 1]) + sum(logq[y == 0])
     return(logl - sum(beta^2) / 8)
}
outfun <- function(z) c(z, z^2)
@
Then the following works
<<doit-global>>=
out <- metrop(lupost, beta.init, 1e3)
out$accept
out <- metrop(out, scale = 0.1)
out$accept
out <- metrop(out, scale = 0.3)
out$accept
out <- metrop(out, scale = 0.5)
out$accept
out <- metrop(out, scale = 0.4)
out$accept
out <- metrop(out, nbatch = 100, blen = 100, outfun = outfun)
out$accept
@

We get the best of both worlds.  We don't need \verb@x = x, y = y@
and we don't need a function factory.

But if we change the name of the global variables to say
\texttt{modmat} and \texttt{resp} instead of \texttt{x} and \texttt{y},
then our code breaks.  R function \texttt{lupost} is looking up
global variables \texttt{x} and \texttt{y} \emph{under those names}
not under any other names.  So your code using this method is rigid and brittle
and probably unusable by others.

Note that if we are using either of the other methods, renaming is no problem.
Using dot-dot-dot we do
\begin{verbatim}
out <- metrop(out, scale = 0.1, x = modmat, y = resp)
\end{verbatim}
Using the function factory we do
\begin{verbatim}
lupost <- lupost_factory(modmat, resp)
\end{verbatim}
See Section~7.5.3 of \citet{basic} for more about this.

\subsection{Function Factory}

The terminology ``function factory pattern'' is apparently due to
\citet[Section~10.3.1]{wickham}.  But it is just a special case about
how closures work (in R and all other languages that have them).
Compare
<<fred>>=
fred <- function(y) function(x) x + y
fred(2)(3)
@
\citep[Section~6.5]{basic}) to
<<currying>>=
lupost_factory <- function(x, y) function(beta) {
    eta <- as.numeric(x %*% beta)
    logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta)))
    logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta)))
    logl <- sum(logp[y == 1]) + sum(logq[y == 0])
    return(logl - sum(beta^2) / 8)
}
lupost <- lupost_factory(x, y)
lupost(beta.init)
@
We could also do the same calculation treating \verb@lupost_factory@
as just an ordinary curried function, like R function \texttt{fred}
in the preceding example,
<<currying-too>>=
lupost_factory(x, y)(beta.init)
@

So there is nothing mysterious about function factories.  They are just
one particular use of
the essence of functional programming (closures).
If you really understand functional programming, then you must understand this.

Long ago, when I switched from S to R,
I would not have considered that being an ``knowledgeable user'' of R
required knowledge of closures.  Now I do.  This is partly because other
functional languages like Scheme, Javascript, F\#, Clojure, and Haskell
have become
very popular for general computer programming.  So now we want to emphasize
that we can do in R what we can do in them.
But it is also partly because I now understand more about functional
programming.  I can now see that if you really understand closures,
then you don't need most other features of these programming languages
(including R).  Following \citet{crockford} we can say that all programming
languages (including R) have their good parts and their bad parts and we
should use the good features and avoid the bad features.  The best feature
is closures.  Crockford calls this the best idea ever put in a programming
language.  So we should use it a lot.

If I were starting to write the \texttt{mcmc} package today, I might leave
out the dot-dot-dot arguments of R functions \texttt{metrop},
\texttt{morph.metrop}, and \texttt{temper}.  That would mean that
users could not use the dot-dot-dot pattern.  They would be forced to
use the function factory pattern or the global variables pattern.
And if they had accepted that the global variables pattern is evil,
then they would have to use the function factory pattern.
(If you like the dot-dot-dot pattern, don't worry.  It is not going
to be removed from this package.  Backward compatibility trumps everything.)

\begin{thebibliography}{}

\bibitem[C2 Wiki(2013)]{c2}
C2 Wiki Contributors (2013).
\newblock Global Variables Are Bad.
\newblock \url{http://wiki.c2.com/?GlobalVariablesAreBad}.

\bibitem[Crockford(2008)]{crockford}
Crockford, D. (2008).
\newblock \emph{JavaScript: The Good Parts}.
\newblock O'Reilly, Sebastopol CA.

\bibitem[Gelman et al.(1996)Gelman, Roberts, and Gilks]{grg}
Gelman, A., G.~O. Roberts, and W.~R. Gilks (1996).
\newblock Efficient Metropolis jumping rules.
\newblock In \emph{Bayesian Statistics, 5 (Alicante, 1994)}, pp.~599--607.
  Oxford University Press.

\bibitem[Geyer(1992)]{practical}
Geyer, C.~J. (1992).
\newblock Practical Markov chain Monte Carlo (with discussion).
\newblock \emph{Statistical Science}, 7, 473--511.

\bibitem[Geyer(2006)]{bogosity}
Geyer, C. J. (2006).
\newblock On the bogosity of MCMC diagnostics.
\newblock \url{http://users.stat.umn.edu/~geyer/mcmc/diag.html}.

\bibitem[Geyer(2011)]{intro}
Geyer, C. J. (2011).
\newblock Introduction to Markov chain Monte Carlo.
\newblock In \emph{Handbook of Markov Chain Monte Carlo}, edited by
    Brooks, S., Gelman, A., Jones, G., and Meng, X.-L., pp.~3--48.
\newblock Chapman \& Hall/CRC, Boca Raton, FL.

\bibitem[Geyer(2018)]{basic}
Geyer, C.~J. (2018).
\newblock Stat 3701 lecture notes: Basics of R.
\newblock \url{http://www.stat.umn.edu/geyer/3701/notes/basic.html}.

\bibitem[Geyer and Thompson(1995)]{geyer-temp}
Geyer, C.~J. and E.~A. Thompson (1995).
\newblock Annealing Markov chain Monte Carlo with applications to
    ancestral inference.
\newblock \emph{Journal of the American Statistical Association}, 90, 909--920.

\bibitem[Wickham(2014)]{wickham}
Wickham, H. (2014).
\newblock \emph{Advanced R}.
\newblock Chapman \& Hall/CRC, Boca Raton, FL.
\newblock Also available on the web \url{http://adv-r.had.co.nz/}.

\end{thebibliography}

\end{document}