1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
|
\name{foreach}
\alias{foreach}
\alias{when}
\alias{times}
\alias{\%:\%}
\alias{\%do\%}
\alias{\%dopar\%}
\title{foreach}
\description{
\code{\%do\%} and \code{\%dopar\%} are binary operators that operate
on a \code{foreach} object and an \code{R} expression.
The expression, \code{ex}, is evaluated multiple times in an environment
that is created by the \code{foreach} object, and that environment is
modified for each evaluation as specified by the \code{foreach} object.
\code{\%do\%} evaluates the expression sequentially, while \code{\%dopar\%}
evalutes it in parallel.
The results of evaluating \code{ex} are returned as a list by default,
but this can be modified by means of the \code{.combine} argument.
}
\usage{
foreach(..., .combine, .init, .final=NULL, .inorder=TRUE,
.multicombine=FALSE,
.maxcombine=if (.multicombine) 100 else 2,
.errorhandling=c('stop', 'remove', 'pass'),
.packages=NULL, .export=NULL, .noexport=NULL,
.verbose=FALSE)
when(cond)
e1 \%:\% e2
obj \%do\% ex
obj \%dopar\% ex
times(n)
}
\arguments{
\item{\dots}{one or more arguments that control how \code{ex} is
evaluated. Named arguments specify the name and values of variables
to be defined in the evaluation environment.
An unnamed argument can be used to specify the number of times that
\code{ex} should be evaluated.
At least one argument must be specified in order to define the
number of times \code{ex} should be executed.}
\item{.combine}{function that is used to process the tasks results as
they generated. This can be specified as either a function or
a non-empty character string naming the function.
Specifying 'c' is useful for concatenating the results into
a vector, for example. The values 'cbind' and 'rbind' can combine
vectors into a matrix. The values '+' and '*' can be used to
process numeric data.
By default, the results are returned in a list.}
\item{.init}{initial value to pass as the first argument of the
\code{.combine} function.
This should not be specified unless \code{.combine} is also specified.}
\item{.final}{function of one argument that is called to return final result.}
\item{.inorder}{logical flag indicating whether the \code{.combine}
function requires the task results to be combined in the same order
that they were submitted. If the order is not important, then it
setting \code{.inorder} to \code{FALSE} can give improved performance.
The default value is \code{TRUE}.}
\item{.multicombine}{logical flag indicating whether the \code{.combine}
function can accept more than two arguments.
If an arbitrary \code{.combine} function is specified, by default,
that function will always be called with two arguments.
If it can take more than two arguments, then setting \code{.multicombine}
to \code{TRUE} could improve the performance.
The default value is \code{FALSE} unless the \code{.combine}
function is \code{cbind}, \code{rbind}, or \code{c}, which are known
to take more than two arguments.}
\item{.maxcombine}{maximum number of arguments to pass to the combine function.
This is only relevant if \code{.multicombine} is \code{TRUE}.}
\item{.errorhandling}{specifies how a task evalution error should be handled.
If the value is "stop", then execution will be stopped via
the \code{stop} function if an error occurs.
If the value is "remove", the result for that task will not be
returned, or passed to the \code{.combine} function.
If it is "pass", then the error object generated by task evaluation
will be included with the rest of the results. It is assumed that
the combine function (if specified) will be able to deal with the
error object.
The default value is "stop".}
\item{.packages}{character vector of packages that the tasks depend on.
If \code{ex} requires a \code{R} package to be loaded, this option
can be used to load that package on each of the workers.
Ignored when used with \code{\%do\%}.}
\item{.export}{character vector of variables to export.
This can be useful when accessing a variable that isn't defined in the
current environment.
The default value in \code{NULL}.}
\item{.noexport}{character vector of variables to exclude from exporting.
This can be useful to prevent variables from being exported that aren't
actually needed, perhaps because the symbol is used in a model formula.
The default value in \code{NULL}.}
\item{.verbose}{logical flag enabling verbose messages. This can be
very useful for trouble shooting.}
\item{obj}{\code{foreach} object used to control the evaluation
of \code{ex}.}
\item{e1}{\code{foreach} object to merge.}
\item{e2}{\code{foreach} object to merge.}
\item{ex}{the \code{R} expression to evaluate.}
\item{cond}{condition to evaluate.}
\item{n}{number of times to evaluate the \code{R} expression.}
}
\details{
The \code{foreach} and \code{\%do\%}/\code{\%dopar\%} operators provide
a looping construct that can be viewed as a hybrid of the standard
\code{for} loop and \code{lapply} function.
It looks similar to the \code{for} loop, and it evaluates an expression,
rather than a function (as in \code{lapply}), but it's purpose is to
return a value (a list, by default), rather than to cause side-effects.
This faciliates parallelization, but looks more natural to people that
prefer \code{for} loops to \code{lapply}.
The \code{\%:\%} operator is the \emph{nesting} operator, used for creating
nested foreach loops. Type \code{vignette("nested")} at the R prompt for
more details.
Parallel computation depends upon a \emph{parallel backend} that must be
registered before performing the computation. The parallel backends available
will be system-specific, but include \code{doParallel}, which uses R's built-in
\pkg{parallel} package, \pkg{doMC}, which uses the \pkg{multicore} package,
and \pkg{doSNOW}. Each parallel backend has a specific registration function,
such as \code{registerDoParallel} or \code{registerDoSNOW}.
The \code{times} function is a simple convenience function that calls
\code{foreach}. It is useful for evaluating an \code{R} expression multiple
times when there are no varying arguments. This can be convenient for
resampling, for example.
}
\seealso{
\code{\link[iterators]{iter}}
}
\examples{
# equivalent to rnorm(3)
times(3) \%do\% rnorm(1)
# equivalent to lapply(1:3, sqrt)
foreach(i=1:3) \%do\%
sqrt(i)
# equivalent to colMeans(m)
m <- matrix(rnorm(9), 3, 3)
foreach(i=1:ncol(m), .combine=c) \%do\%
mean(m[,i])
# normalize the rows of a matrix in parallel, with parenthesis used to
# force proper operator precedence
# Need to register a parallel backend before this example will run
# in parallel
foreach(i=1:nrow(m), .combine=rbind) \%dopar\%
(m[i,] / mean(m[i,]))
# simple (and inefficient) parallel matrix multiply
library(iterators)
a <- matrix(1:16, 4, 4)
b <- t(a)
foreach(b=iter(b, by='col'), .combine=cbind) \%dopar\%
(a \%*\% b)
# split a data frame by row, and put them back together again without
# changing anything
d <- data.frame(x=1:10, y=rnorm(10))
s <- foreach(d=iter(d, by='row'), .combine=rbind) \%dopar\% d
identical(s, d)
# a quick sort function
qsort <- function(x) {
n <- length(x)
if (n == 0) {
x
} else {
p <- sample(n, 1)
smaller <- foreach(y=x[-p], .combine=c) \%:\% when(y <= x[p]) \%do\% y
larger <- foreach(y=x[-p], .combine=c) \%:\% when(y > x[p]) \%do\% y
c(qsort(smaller), x[p], qsort(larger))
}
}
qsort(runif(12))
}
\keyword{utilities}
|