File: runParallel.Rd

package info (click to toggle)
hmisc 5.2-4-2
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 4,044 kB
  • sloc: asm: 28,905; f90: 590; ansic: 415; xml: 160; fortran: 75; makefile: 2
file content (93 lines) | stat: -rw-r--r-- 4,228 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/runParallel.r
\name{runParallel}
\alias{runParallel}
\title{runParallel}
\usage{
runParallel(
  onecore,
  reps,
  seed = round(runif(1, 0, 10000)),
  cores = max(1, parallel::detectCores() - 1),
  simplify = TRUE,
  along
)
}
\arguments{
\item{onecore}{function to run the analysis on one core}

\item{reps}{total number of repetitions}

\item{seed}{species the base random number seed.  The seed used for core i will be \code{seed} + \code{i}.}

\item{cores}{number of cores to use, defaulting to one less than the number available}

\item{simplify}{set to FALSE to not create an outer list if a \code{onecore} result has only one element}

\item{along}{see Details}
}
\value{
result from combining all the parallel runs, formatting as similar to the result produced from one run as possible
}
\description{
parallel Package Easy Front-End
}
\details{
Given a function \code{onecore} that runs the needed set of simulations on
one CPU core, and given a total number of repetitions \code{reps}, determines
the number of available cores and by default uses one less than that.
By default the number of cores is one less than the number available
on your machine.
reps is divided as evenly as possible over these cores, and batches
are run on the cores using the \code{parallel} package \code{mclapply} function.
The current per-core repetition number is continually updated in
your system's temporary directory (/tmp for Linux and Mac, TEMP for Windows)
in a file name progressX.log where X is the core number.
The random number seed is set for each core and is equal to
the scalar \code{seed} - core number + 1.  The default seed is a random
number between 0 and 10000 but it's best if the user provides the
seed so the simulation is reproducible.
The total run time is computed and printed
onefile must create a named list of all the results created during
that one simulation batch.  Elements of this list must be data frames,
vectors, matrices, or arrays.   Upon completion of all batches,
all the results are rbind'd and saved in a single list.

onecore must have an argument \code{reps} that will tell the function
how many simulations to run for one batch, another argument \code{showprogress}
which is a function to be called inside onecore to write to the
progress file for the current core and repetition, and an argument \code{core}
which informs \code{onecore} which sequential core number (batch number) it is
processing.
When calling \code{showprogress} inside \code{onecore}, the arguments, in order,
must be the integer value of the repetition to be noted, the number of reps,
\code{core}, an optional 4th argument \code{other} that can contain a single
character string to add to the output, and an optional 5th argument \code{pr}.
You can set \code{pr=FALSE} to suppress printing and have \code{showprogress}
return the file name for holding progress information if you want to
customize printing.

If any of the objects appearing as list elements produced by onecore
are multi-dimensional arrays, you must specify an integer value for
\code{along}.  This specifies to the \code{abind} package \code{abind} function
the dimension along which to bind the arrays.  For example, if the
first dimension of the array corresponding to repetitions, you would
specify along=1.   All arrays present must use the same \code{along} unless
\code{along} is a named vector and the names match elements of the
simulation result object.
Set \code{simplify=FALSE} if you don't want the result simplified if
onecore produces only one list element.  The default returns the
first (and only) list element rather than the list if there is only one
element.

When \code{onecore} returns a \code{data.table}, \code{runParallel} simplifies all this and merely
rbinds all the per-core data tables into one large data table.  In that case when you
have \code{onecore} include a column containing a simulation number, it is wise to prepend
that number with the core number so that you will have unique simulation IDs when
all the cores' results are combined.

See \href{https://hbiostat.org/rflow/parallel.html}{here} for examples.
}
\author{
Frank Harrell
}