File: req_perform_parallel.Rd

package info (click to toggle)
r-cran-httr2 1.2.2-1
links: PTS, VCS
area: main
in suites: sid
size: 1,684 kB
sloc: sh: 13; makefile: 2
file content (110 lines) | stat: -rw-r--r-- 4,729 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/req-perform-parallel.R
\name{req_perform_parallel}
\alias{req_perform_parallel}
\title{Perform a list of requests in parallel}
\usage{
req_perform_parallel(
  reqs,
  paths = NULL,
  on_error = c("stop", "return", "continue"),
  progress = TRUE,
  max_active = 10,
  mock = getOption("httr2_mock", NULL)
)
}
\arguments{
\item{reqs}{A list of \link{request}s.}

\item{paths}{An optional character vector of paths, if you want to download
the response bodies to disk. If supplied, must be the same length as
\code{reqs}.}

\item{on_error}{What should happen if one of the requests fails?
\itemize{
\item \code{stop}, the default: stop iterating with an error.
\item \code{return}: stop iterating, returning all the successful responses
received so far, as well as an error object for the failed request.
\item \code{continue}: continue iterating, recording errors in the result.
}}

\item{progress}{Display a progress bar for the status of all requests? Use
\code{TRUE} to turn on a basic progress bar, use a string to give it a name,
or see \link{progress_bars} to customize it in other ways. Not compatible with
\code{\link[=req_progress]{req_progress()}}, as httr2 can only display a single progress bar at a
time.}

\item{max_active}{Maximum number of concurrent requests.}

\item{mock}{A mocking function. If supplied, this function is called
with the request. It should return either \code{NULL} (if it doesn't want to
handle the request) or a \link{response} (if it does). See
\code{\link[=with_mocked_responses]{with_mocked_responses()}}/\code{local_mocked_responses()} for more details.}
}
\value{
A list, the same length as \code{reqs}, containing \link{response}s and possibly
error objects, if \code{on_error} is \code{"return"} or \code{"continue"} and one of the
responses errors. If \code{on_error} is \code{"return"} and it errors on the ith
request, the ith element of the result will be an error object, and the
remaining elements will be \code{NULL}. If \code{on_error} is \code{"continue"}, it will
be a mix of requests and error objects.

Only httr2 errors are captured; see \code{\link[=req_error]{req_error()}} for more details.
}
\description{
This variation on \code{\link[=req_perform_sequential]{req_perform_sequential()}} performs multiple requests in
parallel. Never use it without \code{\link[=req_throttle]{req_throttle()}}; otherwise it's too easy to
pummel a server with a very large number of simultaneous requests.

While running, you'll get a progress bar that looks like:
\verb{[working] (1 + 4) -> 5 -> 5}. The string tells you the current status of
the queue (e.g. working, waiting, errored) followed by (the
number of pending requests + pending retried requests) -> the number of
active requests -> the number of complete requests.
\subsection{Limitations}{

The main limitation of \code{req_perform_parallel()} is that it assumes applies
\code{\link[=req_throttle]{req_throttle()}} and \code{\link[=req_retry]{req_retry()}} are across all requests. This means,
for example, that if request 1 is throttled, but request 2 is not,
\code{req_perform_parallel()} will wait for request 1 before performing request 2.
This makes it most suitable for performing many parallel requests to the same
host, rather than a mix of different hosts. It's probably possible to remove
these limitation, but it's enough work that I'm unlikely to do it unless
I know that people would fine it useful: so please let me know!

Additionally, it does not respect the \code{max_tries} argument to \code{req_retry()}
because if you have five requests in flight and the first one gets rate
limited, it's likely that all the others do too. This also means that
the circuit breaker is never triggered.
}
}
\examples{
# Requesting these 4 pages one at a time would take 2 seconds:
request_base <- request(example_url()) |>
  req_throttle(capacity = 100, fill_time_s = 60)
reqs <- list(
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5")
)
# But it's much faster if you request in parallel
system.time(resps <- req_perform_parallel(reqs))

# req_perform_parallel() will fail on error
reqs <- list(
  request_base |> req_url_path("/status/200"),
  request_base |> req_url_path("/status/400"),
  request("FAILURE")
)
try(resps <- req_perform_parallel(reqs))

# but can use on_error to capture all successful results
resps <- req_perform_parallel(reqs, on_error = "continue")

# Inspect the successful responses
resps |> resps_successes()

# And the failed responses
resps |> resps_failures() |> resps_requests()
}