File: progressr-22-parallel-processing.html

package info (click to toggle)
r-cran-progressr 0.18.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 2,184 kB
  • sloc: sh: 13; makefile: 7
file content (291 lines) | stat: -rw-r--r-- 10,494 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>progressr: Parallel and Distributed Processing</title>
<style>
body {
  font-family: sans-serif;
  line-height: 1.6;
  padding-left: 3ex;
  padding-right: 3ex;
  background-color: white;
  color: black;
}

a {
  color: #4183C4;
  text-decoration: none;
}

h1, h2, h3 {
  margin: 2ex 0 1ex;
  padding: 0;
  font-weight: bold;
  -webkit-font-smoothing: antialiased;
  cursor: text;
  position: relative;
}

h2 {
  border-bottom: 1px solid #cccccc;
}

code {
  margin: 0 2px;
  padding: 0 5px;
  white-space: nowrap;
  border: 1px solid #eaeaea;
  background-color: #f8f8f8;
  border-radius: 3px;
}

pre code {
  margin: 0;
  padding: 0;
  white-space: pre;
  border: none;
  background: transparent;
}

pre {
  background-color: #f8f8f8;
  border: 1px solid #cccccc;
  line-height: 2.5x;
  overflow: auto;
  padding: 0.6ex 1ex;
  border-radius: 3px;
}

pre code {
  background-color: transparent;
  border: none;
}
</style>
</head>
<body>
<h1>progressr: Parallel and Distributed Processing</h1>
<!--
%\VignetteIndexEntry{progressr: Parallel and Distributed Processing}
%\VignetteAuthor{Henrik Bengtsson}
%\VignetteKeyword{R}
%\VignetteKeyword{package}
%\VignetteKeyword{vignette}
%\VignetteKeyword{progress}
%\VignetteKeyword{parallel}
%\VignetteKeyword{distributed}
%\VignetteEngine{progressr::selfonly}
-->
<h2>TL;DR</h2>
<p>The <strong>progressr</strong> package works seamlessly with parallel and
distributed processing using <strong><a href="https://www.futureverse.org">futureverse</a></strong>, and it will also
provide near-live progress updates while the parallel processing is
still running. For example,</p>
<pre><code class="language-r">library(future)
library(progressr)
plan(multisession, workers = 2)
handlers(global = TRUE)
handlers(&quot;progress&quot;)

my_fcn &lt;- function(xs) {
  p &lt;- progressr::progressor(along = xs)
  future.apply::future_lapply(xs, function(x, ...) {
    Sys.sleep((10.0-x)/2)
    p(sprintf(&quot;x=%g&quot;, x))
    sqrt(x)
  })
}

y &lt;- my_fcn(1:10)
# / [================&gt;-----------------------------]  40% x=2
</code></pre>
<h2>Introduction</h2>
<p>The <strong><a href="https://www.futureverse.org">futureverse</a></strong> framework, which provides a unified API for parallel
and distributed processing in R, has built-in support for the kind of
progression updates produced by the <strong>progressr</strong> package.  This means
that you can use it with for instance <strong><a href="https://future.apply.futureverse.org">future.apply</a></strong>, <strong><a href="https://furrr.futureverse.org">furrr</a></strong>,
and <strong><a href="https://cran.r-project.org/package=foreach">foreach</a></strong> with <strong><a href="https://doFuture.futureverse.org">doFuture</a></strong>, and <strong><a href="https://cran.r-project.org/package=plyr">plyr</a></strong> or
<strong><a href="https://www.bioconductor.org/packages/BiocParallel/">BiocParallel</a></strong> with <strong>doFuture</strong>.  In contrast, <em>non-future</em>
parallelization methods such as <strong>parallel</strong>'s <code>mclapply()</code> and,
<code>parallel::parLapply()</code>, and <strong>foreach</strong> adapters like <strong>doParallel</strong>
do <em>not</em> support progress reports via <strong>progressr</strong>.</p>
<h3>future_lapply() - parallel lapply()</h3>
<p>Here is an example that uses <code>future_lapply()</code> of the <strong><a href="https://future.apply.futureverse.org">future.apply</a></strong> package to parallelize on the local machine while at the same time signaling progression updates:</p>
<pre><code class="language-r">library(future.apply)
plan(multisession, workers = 2)

library(progressr)
handlers(global = TRUE)

my_fcn &lt;- function(xs) {
  p &lt;- progressor(along = xs)
  future_lapply(xs, function(x, ...) {
    Sys.sleep((10.0-x)/2)
    p(sprintf(&quot;x=%g&quot;, x))
    sqrt(x)
  })
}

y &lt;- my_fcn(1:10)
# / [================&gt;-----------------------------]  40% x=2
</code></pre>
<h3>foreach() with doFuture</h3>
<p>Here is an example that uses <code>foreach()</code> of the <strong><a href="https://cran.r-project.org/package=foreach">foreach</a></strong> package
together with <code>%dofuture%</code> of the <strong><a href="https://doFuture.futureverse.org">doFuture</a></strong> package to
parallelize while reporting on progress.  This example parallelizes on
the local machine, it works also for remote machines:</p>
<pre><code class="language-r">library(doFuture)    ## %dofuture%
plan(multisession, workers = 2)

library(progressr)
handlers(global = TRUE)
handlers(&quot;progress&quot;)

my_fcn &lt;- function(xs) {
  p &lt;- progressor(along = xs)
  foreach(x = xs) %dofuture% {
    Sys.sleep((10.0-x)/2)
    p(sprintf(&quot;x=%g&quot;, x))
    sqrt(x)
  }
}

y &lt;- my_fcn(1:10)
# / [================&gt;-----------------------------]  40% x=2
</code></pre>
<p>For existing code using the traditional <code>%dopar%</code> operators of the
<strong><a href="https://cran.r-project.org/package=foreach">foreach</a></strong> package, we can register the <strong><a href="https://doFuture.futureverse.org">doFuture</a></strong> adapter and
use the same <strong>progressr</strong> as above to progress updates;</p>
<pre><code class="language-r">library(doFuture)
registerDoFuture()      ## %dopar% parallelizes via future
plan(multisession, workers = 2)

library(progressr)
handlers(global = TRUE)
handlers(&quot;progress&quot;)

my_fcn &lt;- function(xs) {
  p &lt;- progressor(along = xs)
  foreach(x = xs) %dopar% {
    Sys.sleep((10.0-x)/2)
    p(sprintf(&quot;x=%g&quot;, x))
    sqrt(x)
  }
}

y &lt;- my_fcn(1:10)
# / [================&gt;-----------------------------]  40% x=2
</code></pre>
<h3>future_map() - parallel purrr::map()</h3>
<p>Here is an example that uses <code>future_map()</code> of the <strong><a href="https://furrr.futureverse.org">furrr</a></strong> package
to parallelize on the local machine while at the same time signaling
progression updates:</p>
<pre><code class="language-r">library(furrr)
plan(multisession, workers = 2)

library(progressr)
handlers(global = TRUE)
handlers(&quot;progress&quot;)

my_fcn &lt;- function(xs) {
  p &lt;- progressor(along = xs)
  future_map(xs, function(x) {
    Sys.sleep((10.0-x)/2)
    p(sprintf(&quot;x=%g&quot;, x))
    sqrt(x)
  })
}

y &lt;- my_fcn(1:10)
# / [================&gt;-----------------------------]  40% x=2
</code></pre>
<p><em>Note:</em> This solution does not involved the <code>.progress = TRUE</code>
argument that <strong>furrr</strong> implements.  Because <strong>progressr</strong> is more
generic and because <code>.progress = TRUE</code> only supports certain future
backends and produces errors on non-supported backends, I recommended
to stop using <code>.progress = TRUE</code> and use the <strong>progressr</strong> package
instead.</p>
<h3>BiocParallel::bplapply() - parallel lapply()</h3>
<p>Here is an example that uses <code>bplapply()</code> of the <strong><a href="https://www.bioconductor.org/packages/BiocParallel/">BiocParallel</a></strong>
package to parallelize on the local machine while at the same time
signaling progression updates:</p>
<pre><code class="language-r">library(BiocParallel)
library(doFuture)
register(DoparParam())  ## BiocParallel parallelizes via %dopar%
registerDoFuture()      ## %dopar% parallelizes via future
plan(multisession, workers = 2)

library(progressr)
handlers(global = TRUE)
handlers(&quot;progress&quot;)

my_fcn &lt;- function(xs) {
  p &lt;- progressor(along = xs)
  bplapply(xs, function(x) {
    Sys.sleep((10.0-x)/2)
    p(sprintf(&quot;x=%g&quot;, x))
    sqrt(x)
  })
}

y &lt;- my_fcn(1:10)
# / [================&gt;-----------------------------]  40% x=2
</code></pre>
<h3>plyr::llply(..., .parallel = TRUE) with doFuture</h3>
<p>Here is an example that uses <code>llply()</code> of the <strong><a href="https://cran.r-project.org/package=plyr">plyr</a></strong> package to
parallelize on the local machine while at the same time signaling
progression updates:</p>
<pre><code class="language-r">library(plyr)
library(doFuture)
registerDoFuture()      ## %dopar% parallelizes via future
plan(multisession, workers = 2)

library(progressr)
handlers(global = TRUE)
handlers(&quot;progress&quot;)

my_fcn &lt;- function(xs) {
  p &lt;- progressor(along = xs)
  llply(xs, function(x, ...) {
    Sys.sleep((10.0-x)/2)
    p(sprintf(&quot;x=%g&quot;, x))
    sqrt(x)
  }, .parallel = TRUE)
}

y &lt;- my_fcn(1:10)
# / [================&gt;-----------------------------]  40% x=2
</code></pre>
<p><em>Note:</em> As an alternative to the above, recommended approach, one can
use <code>.progress = &quot;progressr&quot;</code> together with <code>.parallel = TRUE</code>.  This
requires <strong>plyr</strong> (&gt;= 1.8.7).</p>
<h3>Near-live versus buffered progress updates with futures</h3>
<p>As of August 2025, there are six types of <strong>future</strong> backends that are
known(*) to provide near-live progress updates:</p>
<ol>
<li><code>sequential</code>,</li>
<li><code>multicore</code>,</li>
<li><code>multisession</code>, and</li>
<li><code>cluster</code> (local and remote)</li>
<li><code>future.callr::callr</code></li>
<li><code>future.mirai::mirai_multisession</code></li>
</ol>
<p>Here &quot;near-live&quot; means that the progress handlers will report on
progress almost immediately when the progress is signaled on the
worker. This is because these parallel backends handle the special
condition class <code>immediateCondition</code> - they detect when such
conditions are signaled and relay them to the parent R process as soon
as possible. For all other future backends, the progress updates are
only relayed back to the main machine and reported together with the
results of the futures.  For instance, if <code>future_lapply(X, FUN)</code>
chunks up the processing of, say, 100 elements in <code>X</code> into eight
futures, we will see progress from each of the 100 elements as they
are done when using a future backend supporting &quot;near-live&quot; updates,
whereas we will only see those updated to be flushed eight times when
using any other types of future backends.</p>
<p>(*) Other future backends may gain support for &quot;near-live&quot; progress
updating later.  Adding support for those is independent of the
<strong>progressr</strong> package.  Feature requests for adding that support
should go to those future-backend packages.</p>
</body>
</html>