File: availableCores.Rd

package info (click to toggle)
r-cran-parallelly 1.42.0-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 1,216 kB
sloc: ansic: 111; sh: 13; makefile: 2
file content (243 lines) | stat: -rw-r--r-- 9,635 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/availableCores.R
\name{availableCores}
\alias{availableCores}
\title{Get Number of Available Cores on The Current Machine}
\usage{
availableCores(
  constraints = NULL,
  methods = getOption2("parallelly.availableCores.methods", c("system",
    "/proc/self/status", "cgroups.cpuset", "cgroups.cpuquota", "cgroups2.cpu.max",
    "nproc", "mc.cores", "BiocParallel", "_R_CHECK_LIMIT_CORES_", "Bioconductor", "LSF",
    "PJM", "PBS", "SGE", "Slurm", "fallback", "custom")),
  na.rm = TRUE,
  logical = getOption2("parallelly.availableCores.logical", TRUE),
  default = c(current = 1L),
  which = c("min", "max", "all"),
  omit = getOption2("parallelly.availableCores.omit", 0L)
)
}
\arguments{
\item{constraints}{An optional character specifying under what
constraints ("purposes") we are requesting the values.
For instance, on systems where multicore processing is not supported
(i.e. Windows), using \code{constraints = "multicore"} will force a
single core to be reported.
Using \code{constraints = "connections"}, will append \code{"connections"} to
the \code{methods} argument.
It is possible to specify multiple constraints, e.g.
\code{constraints = c("connections", "multicore")}.}

\item{methods}{A character vector specifying how to infer the number
of available cores.}

\item{na.rm}{If TRUE, only non-missing settings are considered/returned.}

\item{logical}{Passed to
\code{\link[parallel]{detectCores}(logical = logical)}, which,
\emph{if supported}, returns the number of logical CPUs (TRUE) or physical
CPUs/cores (FALSE).
At least as of R 4.2.2, \code{detectCores()} this argument on Linux.
This argument is only if argument \code{methods} includes \code{"system"}.}

\item{default}{The default number of cores to return if no non-missing
settings are available.}

\item{which}{A character specifying which settings to return.
If \code{"min"} (default), the minimum value is returned.
If \code{"max"}, the maximum value is returned (be careful!)
If \code{"all"}, all values are returned.}

\item{omit}{(integer; non-negative) Number of cores to not include.}
}
\value{
Return a positive (>= 1) integer.
If \code{which = "all"}, then more than one value may be returned.
Together with \code{na.rm = FALSE} missing values may also be returned.
}
\description{
The current/main \R session counts as one, meaning the minimum
number of cores available is always at least one.
}
\details{
The following settings ("methods") for inferring the number of cores
are supported:
\itemize{
\item \code{"system"} -
Query \code{\link[parallel]{detectCores}(logical = logical)}.

\item \code{"/proc/self/status"} -
Query \code{Cpus_allowed_list} of \verb{/proc/self/status}.

\item \code{"cgroups.cpuset"} -
On Unix, query control group (cgroup v1) value \code{cpuset.set}.

\item \code{"cgroups.cpuquota"} -
On Unix, query control group (cgroup v1) value
\code{cpu.cfs_quota_us} / \code{cpu.cfs_period_us}.

\item \code{"cgroups2.cpu.max"} -
On Unix, query control group (cgroup v2) values \code{cpu.max}.

\item \code{"nproc"} -
On Unix, query system command \code{nproc}.

\item \code{"mc.cores"} -
If available, returns the value of option
\code{\link[base:options]{mc.cores}}.
Note that \code{mc.cores} is defined as the number of
\emph{additional} \R processes that can be used in addition to the
main \R process.  This means that with \code{mc.cores = 0} all
calculations should be done in the main \R process, i.e. we have
exactly one core available for our calculations.
The \code{mc.cores} option defaults to environment variable
\env{MC_CORES} (and is set accordingly when the \pkg{parallel}
package is loaded).  The \code{mc.cores} option is used by for
instance \code{\link[=mclapply]{mclapply}()} of the \pkg{parallel}
package.

\item \code{"connections"} -
Query the current number of available R connections per
\code{\link[=freeConnections]{freeConnections()}}.  This is the maximum number of socket-based
\strong{parallel} cluster nodes that are possible launch, because each
one needs its own R connection.
The exception is when \code{freeConnections()} is zero, then \code{1L} is
still returned, because \code{availableCores()} should always return a
positive integer.

\item \code{"BiocParallel"} -
Query environment variable \env{BIOCPARALLEL_WORKER_NUMBER} (integer),
which is defined and used by \strong{BiocParallel} (>= 1.27.2).
If the former is set, this is the number of cores considered.

\item \code{"_R_CHECK_LIMIT_CORES_"} -
Query environment variable \env{_R_CHECK_LIMIT_CORES_} (logical or
\code{"warn"}) used by \verb{R CMD check} and set to true by
\verb{R CMD check --as-cran}. If set to a non-false value, then a maximum
of 2 cores is considered.

\item \code{"Bioconductor"} -
Query environment variable \env{IS_BIOC_BUILD_MACHINE} (logical)
used by the Bioconductor (>= 3.16) build and check system. If set to
true, then a maximum of 4 cores is considered.

\item \code{"LSF"} -
Query Platform Load Sharing Facility (LSF) environment variable
\env{LSB_DJOB_NUMPROC}.
Jobs with multiple (CPU) slots can be submitted on LSF using
\verb{bsub -n 2 -R "span[hosts=1]" < hello.sh}.

\item \code{"PJM"} -
Query Fujitsu Technical Computing Suite (that we choose to shorten
as "PJM") environment variables \env{PJM_VNODE_CORE} and
\env{PJM_PROC_BY_NODE}.
The first is set when submitted with \verb{pjsub -L vnode-core=8 hello.sh}.

\item \code{"PBS"} -
Query TORQUE/PBS environment variables \env{PBS_NUM_PPN} and \env{NCPUS}.
Depending on PBS system configuration, these \emph{resource}
parameters may or may not default to one.
An example of a job submission that results in this is
\verb{qsub -l nodes=1:ppn=2}, which requests one node with two cores.

\item \code{"SGE"} -
Query Sun Grid Engine/Oracle Grid Engine/Son of Grid Engine (SGE)
and Univa Grid Engine (UGE)/Altair Grid Engine (AGE) environment
variable \env{NSLOTS}.
An example of a job submission that results in this is
\verb{qsub -pe smp 2} (or \verb{qsub -pe by_node 2}), which
requests two cores on a single machine.

\item \code{"Slurm"} -
Query Simple Linux Utility for Resource Management (Slurm)
environment variable \env{SLURM_CPUS_PER_TASK}.
This may or may not be set.  It can be set when submitting a job,
e.g. \verb{sbatch --cpus-per-task=2 hello.sh} or by adding
\verb{#SBATCH --cpus-per-task=2} to the \file{hello.sh} script.
If \env{SLURM_CPUS_PER_TASK} is not set, then it will fall back to
use \env{SLURM_CPUS_ON_NODE} if the job is a single-node job
(\env{SLURM_JOB_NUM_NODES} is 1), e.g. \verb{sbatch --ntasks=2 hello.sh}.
To make sure all tasks are assign to a single node, specify
\code{--nodes=1}, e.g. \verb{sbatch --nodes=1 --ntasks=16 hello.sh}.

\item \code{"custom"} -
If option
\code{\link[=parallelly.options]{parallelly.availableCores.custom}}
is set and a function,
then this function will be called (without arguments) and it's value
will be coerced to an integer, which will be interpreted as a number
of available cores.  If the value is NA, then it will be ignored.
It is safe for this custom function to call \code{availableCores()}; if
done, the custom function will \emph{not} be recursively called.
}
For any other value of a \code{methods} element, the \R option with the
same name is queried.  If that is not set, the system environment
variable is queried.  If neither is set, a missing value is returned.
}
\section{Avoid ending up with zero cores}{

Note that some machines might have a limited number of cores, or the R
process runs in a container or a cgroup that only provides a small number
of cores.  In such cases:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{ncores <- availableCores() - 1
}\if{html}{\out{</div>}}

may return zero, which is often not intended and is likely to give an
error downstream.  Instead, use:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{ncores <- availableCores(omit = 1)
}\if{html}{\out{</div>}}

to put aside one of the cores from being used.  Regardless how many cores
you put aside, this function is guaranteed to return at least one core.
}

\section{Advanced usage}{

It is possible to override the maximum number of cores on the machine
as reported by \code{availableCores(methods = "system")}.  This can be
done by first specifying
\code{options(parallelly.availableCores.methods = "mc.cores")} and
then the number of cores to use, e.g. \code{options(mc.cores = 8)}.
}

\examples{
message(paste("Number of cores available:", availableCores()))

\dontrun{
options(mc.cores = 2L)
message(paste("Number of cores available:", availableCores()))
}

\dontrun{
## IMPORTANT: availableCores() may return 1L
options(mc.cores = 1L)
ncores <- availableCores() - 1      ## ncores = 0
ncores <- availableCores(omit = 1)  ## ncores = 1
message(paste("Number of cores to use:", ncores))
}

\dontrun{
## Use 75\% of the cores on the system but never more than four
options(parallelly.availableCores.custom = function() {
  ncores <- max(parallel::detectCores(), 1L, na.rm = TRUE)
  ncores <- min(as.integer(0.75 * ncores), 4L)
  max(1L, ncores)
})
message(paste("Number of cores available:", availableCores()))

## Use 50\% of the cores according to availableCores(), e.g.
## allocated by a job scheduler or cgroups.
## Note that it is safe to call availableCores() here.
options(parallelly.availableCores.custom = function() {
  0.50 * parallelly::availableCores()
})
message(paste("Number of cores available:", availableCores()))
}

}
\seealso{
To get the set of available workers regardless of machine,
see \code{\link[=availableWorkers]{availableWorkers()}}.
}