File: rslurm-package.Rd

package info (click to toggle)
r-cran-rslurm 0.6.2%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, forky, sid, trixie
  • size: 428 kB
  • sloc: sh: 25; makefile: 5
file content (96 lines) | stat: -rw-r--r-- 4,038 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rslurm-package.R
\docType{package}
\name{rslurm-package}
\alias{rslurm-package}
\title{Introduction to the \code{rslurm} Package}
\description{
Send long-running or parallel jobs to a Slurm workload manager (i.e. cluster)
using the \code{\link{slurm_call}}, \code{\link{slurm_apply}}, or
\code{\link{slurm_map}} functions.
}
\section{Job submission}{

  
  This package includes three core functions used to send computations to a 
  Slurm cluster: 1) \code{\link{slurm_call}} executes a function using a 
  single set of parameters (passed as a list), 2) \code{\link{slurm_apply}} 
  evaluates a function in parallel for each row of parameters in a given 
  data frame, and 3) \code{\link{slurm_map}} evaluates a function in parallel
  for each element of a list. 
  The functions \code{slurm_apply} and \code{slurm_map} automatically split the 
  parameter rows or list elements into equal-size chunks, 
  each chunk to be processed by a separate cluster node. 
  They use functions from the \code{\link[parallel]{parallel-package}} 
  package to parallelize computations across processors on a given node.
  
  The output of \code{slurm_apply}, \code{slurm_map}, or \code{slurm_call} 
  is a \code{slurm_job} object that serves as an input to the other functions in the package: 
  \code{\link{print_job_status}}, \code{\link{cancel_slurm}}, 
  \code{\link{get_slurm_out}} and \code{\link{cleanup_files}}.
}

\section{Function specification}{

  
  To be compatible with \code{\link{slurm_apply}}, a function may accept any 
  number of single value parameters. The names of these parameters must match
  the column names of the \code{params} data frame supplied. There are no 
  restrictions on the types of parameters passed as a list to 
  \code{\link{slurm_call}} or \code{\link{slurm_map}}
  
  If the function passed to \code{slurm_call} or \code{slurm_apply} requires 
  knowledge of any R objects (data, custom helper functions) besides 
  \code{params}, a character vector corresponding to their names should be 
  passed to the optional \code{global_objects} argument.
  
  When parallelizing a function, since any error will interrupt all 
  calculations for the current node, it may be useful to wrap expressions 
  which may generate errors into a \code{\link[base]{try}} or 
  \code{\link[base:conditions]{tryCatch}} function. This will ensure the computation 
  continues with the next parameter set after reporting the error.
}

\section{Output Format}{

  
  The default output format for \code{get_slurm_out} (\code{outtype = "raw"})
  is a list where each element is the return value of one function call. If 
  the function passed to \code{slurm_apply} produces a vector output, you may
  use \code{outtype = "table"} to collect the output in a single data frame, 
  with one row by function call.
}

\section{Slurm Configuration}{

  
  Advanced options for the Slurm workload manager may accompany job submission
  by \code{\link{slurm_call}}, \code{\link{slurm_map}}, and \code{\link{slurm_apply}} 
  through the optional \code{slurm_options} argument. For example, passing
  \code{list(time = '1:30:00')} for this options limits the job to 1 hour and 30
  minutes. Some advanced configuration must be set through environment 
  variables. On a multi-cluster head node, for example, the \code{SLURM_CLUSTERS}
  environment variable must be set to direct jobs to a non-default cluster.
}

\examples{

\dontrun{
# Create a data frame of mean/sd values for normal distributions 
pars <- data.frame(par_m = seq(-10, 10, length.out = 1000), 
                   par_sd = seq(0.1, 10, length.out = 1000))
                   
# Create a function to parallelize
ftest <- function(par_m, par_sd) {
 samp <- rnorm(10^7, par_m, par_sd)
 c(s_m = mean(samp), s_sd = sd(samp))
}

sjob1 <- slurm_apply(ftest, pars)
print_job_status(sjob1)
res <- get_slurm_out(sjob1, "table")
all.equal(pars, res) # Confirm correct output
cleanup_files(sjob1)
}
  
}