File: temper.Rd

package info (click to toggle)
r-cran-mcmc 0.9-7-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 2,352 kB
  • sloc: ansic: 1,298; makefile: 14; sh: 8
file content (306 lines) | stat: -rw-r--r-- 15,303 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
\name{temper}
\alias{temper}
\alias{temper.function}
\alias{temper.tempering}
\title{Simulated Tempering and Umbrella Sampling}
\description{
    Markov chain Monte Carlo (MCMC) for continuous random vectors using
    parallel or serial tempering, the latter also called umbrella
    sampling and simulated tempering.
    The chain simulates \eqn{k} different distributions on the
    same state space.  In parallel tempering, all the distributions are
    simulated in each iteration.  In serial tempering, only one of the
    distributions is simulated (a random one).  In parallel tempering,
    the state is a \eqn{k \times p}{k by p} matrix, each row of which
    is the state for one of the distributions.
    In serial tempering the state of the Markov chain is a pair \eqn{(i, x)},
    where \eqn{i} is an integer between 1 and \eqn{k} and \eqn{x} is a vector
    of length \eqn{p}.  This pair is represented as a single real vector
    \code{c(i, x)}.  The variable \eqn{i} says which distribution \eqn{x}
    is a simulation for.
}
\usage{
temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1,
    outfun, debug = FALSE, parallel = FALSE, \dots)
\method{temper}{function}(obj, initial, neighbors, nbatch,
    blen = 1, nspac = 1, scale = 1,
    outfun, debug = FALSE, parallel = FALSE, \dots)
\method{temper}{tempering}(obj, initial, neighbors, nbatch,
    blen = 1, nspac = 1, scale = 1,
    outfun, debug = FALSE, parallel = FALSE, \dots)
}
\arguments{
  \item{obj}{either an \R function or an object of class \code{"tempering"}
    from a previous run.

    If a function, it should evaluate the log unnormalized
    density \eqn{\log h(i, x)}{log h(i, x)} of the desired equilibrium
    distribution of the Markov chain for serial tempering (the same function
    is used for both serial and parallel tempering, see Details below for
    further explanation).

    If an object of class \code{"tempering"},
    the log unnormalized density function
    is \code{obj$lud}, and missing arguments of \code{temper} are
    obtained from the corresponding elements of \code{obj}.

    The first argument of the log unnormalized density function is the
    is an \R vector \code{c(i, x)}, where \code{i} says which distribution
    \code{x} is supposed to be a simulation for.
    Other arguments are arbitrary and taken from
    the \code{\dots} arguments of \code{temper}.  The log unnormalized density
    function should return \code{-Inf} in regions of the state space having
    probability zero.}
  \item{initial}{for serial tempering, a real vector \code{c(i, x)} as
    described above.  For parallel tempering, a real
    \eqn{k \times p}{k by p} matrix as described above.  In either case,
    the initial state of the Markov chain.
    Ignored if \code{obj} has class \code{"tempering"}.}
  \item{neighbors}{a logical symmetric matrix of dimension \code{k}
    by \code{k}.  Elements that are \code{TRUE} indicate jumps
    or swaps attempted by the Markov chain.
    Ignored if \code{obj} has class \code{"tempering"}.}
  \item{nbatch}{the number of batches.}
  \item{blen}{the length of batches.}
  \item{nspac}{the spacing of iterations that contribute to batches.}
  \item{scale}{controls the proposal step size for real elements of the state
    vector.  For serial tempering, proposing a new value for the \eqn{x}
    part of the state \eqn{(i, x)}.  For parallel tempering, proposing
    a new value for the \eqn{x_i}{x[i]} part of the state
    \eqn{(x_1, \ldots, x_k)}{(x[1], \ldots, x[k])}.  In either case, the proposal
    is a real vector of length \eqn{p}.  If scalar or vector, the proposal
    is \code{x + scale * z} where \code{x} is the part \eqn{x} or
    \eqn{x_i}{x[i]} of the state the proposal may replace.
    If matrix, the proposal is
    \code{x + scale \%*\% z}.  If list, the length must be \code{k},
    and each element must be scalar, vector, or matrix, and operate as
    described above.  The \eqn{i}-th component of the list is used to update
    \eqn{x} when the state is \eqn{(i, x)} or \eqn{x_i}{x[i]} otherwise.}
  \item{outfun}{controls the output.  If a function, then the batch means
      of \code{outfun(state, \dots)} are returned.  The argument \code{state}
      is like the argument \code{initial} of this function.  If missing, the
      batch means of the real part of the state vector or matrix are returned,
      and for serial tempering the batch means of a multivariate Bernoulli
      indicating the current component are returned.}
  \item{debug}{if \code{TRUE} extra output useful for testing.}
  \item{parallel}{if \code{TRUE} does parallel tempering, if \code{FALSE} does
      serial tempering.
      Ignored if \code{obj} has class \code{"tempering"}.}
  \item{...}{additional arguments for \code{obj} or \code{outfun}.}
}
\details{
Serial tempering simulates a mixture of distributions of a continuous random
vector.  The number of components of the mixture is \code{k}, and the dimension
of the random vector is \code{p}.  Denote the state \eqn{(i, x)}, where \eqn{i}
is a positive integer between 1 and \eqn{k}, and let \eqn{h(i, x)} denote
the unnormalized joint density of their equilibrium distribution.
The logarithm of this function is what \code{obj} or \code{obj$lud} calculates.
The mixture distribution is the marginal for \eqn{x} derived from
the equilibrium distribution \eqn{h(i, x)}, that is,
\deqn{h(x) = \sum_{i = 1}^k h(i, x)}{h(x) = sum[i = 1 to k] h(i, x)}

Parallel tempering simulates a product of distributions of a continuous random
vector.  Denote the state \eqn{(x_1, \ldots, x_k)}{(x[1], \ldots, x[k])},
then the unnormalized joint density of the equilibrium distribution is
\deqn{h(x_1, \ldots, x_k) = \prod_{i = 1}^k h(i, x_i)}{h(x[1], \dots, x[k]) = prod[i = 1 to k] h(i, x[i])}

The update mechanism of the Markov chain combines two kinds of elementary
updates: jump/swap updates (jump for serial tempering, swap for parallel
tempering) and within-component updates.  Each iteration of the Markov chain
one of these elementary updates is done.  With probability 1/2 a jump/swap
update is done, and with probability 1/2 a with-component update is done.

Within-component updates are the same for both serial and parallel tempering.
They are \dQuote{random-walk} Metropolis updates with multivariate normal
proposal, the proposal distribution being determined by the argument
\code{scale}.  In serial tempering, the \eqn{x} part of the current state
\eqn{(i, x)} is updated preserving \eqn{h(i, x)}.
In parallel tempering, an index \eqn{i} is chosen at random and the part
of the state \eqn{x_i}{x[i]} representing that component is updated,
again preserving \eqn{h(i, x)}.

Jump updates choose uniformly at random a neighbor of the current component:
if \eqn{i} indexes the current component, then it chooses uniformly at random
a \eqn{j} such that \code{neighbors[i, j] == TRUE}.  It then does does a
Metropolis-Hastings update for changing the current state from \eqn{(i, x)}
to \eqn{(j, x)}.

Swap updates choose a component uniformly at random and a neighbor of that
component uniformly at random: first an index \eqn{i} is chosen uniformly
at random between 1 and \eqn{k}, then an index \eqn{j} is chosen
uniformly at random such that \code{neighbors[i, j] == TRUE}.  It then does
does a Metropolis-Hastings update for swapping the states of the
two components: interchanging \eqn{x_i}{x[i, ]} and \eqn{x_j}{x[j, ]}
while preserving \eqn{h(x_1, \ldots, x_k)}{h(x[1], \dots, x[k])}.

The initial state must satisfy \code{lud(initial, ...) > - Inf} for serial
tempering or must satisfy \code{lud(initial[i, ], ...) > - Inf} for each
\code{i} for parallel tempering, where \code{lud} is either \code{obj}
or \code{obj$lud}.
That is, the initial state must have positive probability.
}
\value{
  an object of class \code{"mcmc"}, subclass \code{"tempering"},
  which is a list containing at least the following components:
  \item{batch}{the batch means of the continuous part of the state.
    If \code{outfun} is missing, an \code{nbatch} by \code{k} by \code{p}
    array.  Otherwise, an \code{nbatch} by \code{m} matrix, where \code{m}
    is the length of the result of \code{outfun}.}
  \item{ibatch}{(returned for serial tempering only) an \code{nbatch}
    by \code{k} matrix giving batch means for the multivariate Bernoulli
    random vector that is all zeros except for a 1 in the \code{i}-th place
    when the current state is \eqn{(i, x)}.}
  \item{acceptx}{fraction of Metropolis within-component proposals accepted.
    A vector of length \code{k} giving the acceptance rate for each component.}
  \item{accepti}{fraction of Metropolis jump/swap proposals accepted.
    A \code{k} by \code{k} matrix giving the acceptance rate for each allowed
    jump or swap component.  \code{NA} for elements such that the corresponding
    elements of \code{neighbors} is \code{FALSE}.}
  \item{initial}{value of argument \code{initial}.}
  \item{final}{final state of Markov chain.}
  \item{initial.seed}{value of \code{.Random.seed} before the run.}
  \item{final.seed}{value of \code{.Random.seed} after the run.}
  \item{time}{running time of Markov chain from \code{system.time()}.}
  \item{lud}{the function used to calculate log unnormalized density,
  either \code{obj} or \code{obj$lud} from a previous run.}
  \item{nbatch}{the argument \code{nbatch} or \code{obj$nbatch}.}
  \item{blen}{the argument \code{blen} or \code{obj$blen}.}
  \item{nspac}{the argument \code{nspac} or \code{obj$nspac}.}
  \item{outfun}{the argument \code{outfun} or \code{obj$outfun}.}
  Description of additional output when \code{debug = TRUE} can be
  found in the vignette \code{debug}, which is shown by
  \code{vignette("debug", "mcmc")}.
}
\section{Warning}{
If \code{outfun} is missing, then the log unnormalized
density function can be defined without a \ldots argument and that works fine.
One can define it starting \code{ludfun <- function(state)} and that works
or \code{ludfun <- function(state, foo, bar)}, where \code{foo} and \code{bar}
are supplied as additional arguments to \code{temper} and that works too.

If \code{outfun} is a function, then both it and the log unnormalized
density function can be defined without \ldots arguments \emph{if they
have exactly the same arguments list} and that works fine.  Otherwise it
doesn't work.  Define these functions by
\preformatted{
ludfun <- function(state, foo)
outfun <- function(state, bar)
}
and you get an error about unused arguments.  Instead define these functions by
\preformatted{
ludfun <- function(state, foo, \ldots)
outfun <- function(state, bar, \ldots)
}
and supply \code{foo} and \code{bar} as additional arguments to \code{temper},
and that works fine.

In short, the log unnormalized density function and \code{outfun} need
to have \ldots in their arguments list to be safe.  Sometimes it works
when \ldots is left out and sometimes it doesn't.

Of course, one can avoid this whole issue by always defining the log
unnormalized density function and \code{outfun} to have only one argument
\code{state} and use global variables (objects in the \R global environment) to
specify any other information these functions need to use.  That too
follows the \R way.  But some people consider that bad programming practice.

A third option is to define either or both of these functions using a function
factory.  This is demonstrated in the vignette for this package named
\code{demo}, which is shown by \code{vignette("demo", "mcmc")}.
}
\section{Philosophy of MCMC}{
This function follows the philosophy of MCMC explained
the introductory chapter of the
\emph{Handbook of Markov Chain Monte Carlo} (Geyer, 2011a)
and in the chapter of that book on tempering and related subjects
(Geyer, 2011b).
See also the section on philosophy of \code{\link{metrop}}.
}
\section{Tuning}{
The \code{scale} argument must be adjusted so that the acceptance rate
for within-component proposals (component \code{acceptx} of the result
returned by this function)
is not too low or too high to get reasonable performance.
The log unnormalized density function must be chosen so that the acceptance rate
for jump/swap proposals (component \code{accepti} of the result
returned by this function)
is not too low or too high to get reasonable performance.
The former is a vector and the latter is a matrix, and
all these rates must be adjusted to be reasonable.

The rates in in \code{accepti} are influenced by the number of components
of the tempering mixture distribution, by what those components are (how
far apart they are in some unspecified metric on probability distributions),
and by the chosen normalizing constants for those distributions.

For examples of tuning tempering, see Geyer (2011b) and also the vignette
of this package shown by \code{vignette("bfst", "mcmc")}.
The help for R function \code{\link{metrop}} also gives advice on tuning
its sampler, which is relevant for tuning the \code{acceptx} rates.

See also Geyer (1991) and Geyer and Thompson (1995) for the general
theory of tuning parallel and serial tempering.
}

\references{
Geyer, C. J. (1991)
Markov chain Monte Carlo maximum likelihood.
\emph{Computing Science and Statistics: Proc. 23rd Symp. Interface}, 156--163.
\url{http://hdl.handle.net/11299/58440}.

Geyer, C. J. (2011a)
Introduction to MCMC.
In \emph{Handbook of Markov Chain Monte Carlo}. Edited by S. P. Brooks,
A. E. Gelman, G. L. Jones, and X. L. Meng.
Chapman & Hall/CRC, Boca Raton, FL, pp. 3--48.

Geyer, C. J. (2011b)
Importance Sampling, Simulated Tempering, and Umbrella Sampling.
In \emph{Handbook of Markov Chain Monte Carlo}. Edited by S. P. Brooks,
A. E. Gelman, G. L. Jones, and X. L. Meng.
Chapman & Hall/CRC, Boca Raton, FL, pp. 295--312.

Geyer, C. J. and Thompson, E. A. (1995)
Annealing Markov chain Monte Carlo with applications to ancestral inference.
\emph{Journal of the American Statistical Association} \bold{90} 909--920.
}
\examples{
d <- 9
witch.which <- c(0.1, 0.3, 0.5, 0.7, 1.0)
ncomp <- length(witch.which)

neighbors <- matrix(FALSE, ncomp, ncomp)
neighbors[row(neighbors) == col(neighbors) + 1] <- TRUE
neighbors[row(neighbors) == col(neighbors) - 1] <- TRUE

ludfun <- function(state, log.pseudo.prior = rep(0, ncomp)) {
    stopifnot(is.numeric(state))
    stopifnot(length(state) == d + 1)
    icomp <- state[1]
    stopifnot(icomp == as.integer(icomp))
    stopifnot(1 <= icomp && icomp <= ncomp)
    stopifnot(is.numeric(log.pseudo.prior))
    stopifnot(length(log.pseudo.prior) == ncomp)
    theta <- state[-1]
    if (any(theta > 1.0)) return(-Inf)
    bnd <- witch.which[icomp]
    lpp <- log.pseudo.prior[icomp]
    if (any(theta > bnd)) return(lpp)
    return(- d * log(bnd) + lpp)
}

# parallel tempering
thetas <- matrix(0.5, ncomp, d)
out <- temper(ludfun, initial = thetas, neighbors = neighbors, nbatch = 20,
    blen = 10, nspac = 5, scale = 0.56789, parallel = TRUE, debug = TRUE)

# serial tempering
theta.initial <- c(1, rep(0.5, d))
# log pseudo prior found by trial and error
qux <- c(0, 9.179, 13.73, 16.71, 20.56)

out <- temper(ludfun, initial = theta.initial, neighbors = neighbors,
    nbatch = 50, blen = 30, nspac = 2, scale = 0.56789,
    parallel = FALSE, debug = FALSE, log.pseudo.prior = qux)
}
\keyword{misc}