1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678
|
.. _commandRef:
.. _workflowOptions:
Commandline Options
===================
A quick way to see all of Toil's commandline options is by executing the following on a workflow language front-end::
$ toil-wdl-runner --help
Or a Toil Python workflow::
$ python3 example.py --help
For a basic toil workflow, Toil has one mandatory argument, the job store. All other arguments are optional.
The Config File
---------------
Instead of changing the arguments on the command line, Toil offers support for using a configuration file.
Options will be applied with priority:
1. Command line options
2. Environmental Variables
3. Config file values
a. Provided config file through ``--config``
b. Default config value in ``$HOME/.toil/default.yaml``
4. Defaults
You can manually generate an example configuration file to a path you select. To generate a configuration file, run::
$ toil config [filename].yaml
Then uncomment options as necessary and change/provide new values.
After editing the config file, you can run Toil with its settings by passing it on the command line::
$ python3 example.py --config=[filename].yaml
Alternatively, you can edit the default config file, which is located at ``$HOME/.toil/default.yaml``
If CLI options are used in addition to the configuration file, the CLI options will overwrite the configuration file
options. For example::
$ python3 example.py --config=[filename].yaml --defaultMemory 80Gi
This will result in a default memory per job of 80GiB no matter what is in the configuration file provided.
The Job Store
-------------
Running Toil workflows requires a file path or URL to a central location for all of the intermediate files for the workflow: the job store.
For ``toil-cwl-runner`` and ``toil-wdl-runner`` a job store can often be selected automatically or can be specified with the ``--jobStore`` option; Toil Python workflows generally require the job store as a positional command line argument.
To use the :ref:`Python quickstart <pyquickstart>` example,
if you're on a node that has a large **/scratch** volume, you can specify that the jobstore be created there by
executing: ``python3 HelloWorld.py /scratch/my-job-store``, or more explicitly,
``python3 HelloWorld.py file:/scratch/my-job-store``.
Syntax for specifying different job stores:
Local: ``file:job-store-name``
AWS: ``aws:region-here:job-store-name``
Google: ``google:projectID-here:job-store-name``
Different types of job store options can be found below.
.. _optionsRef:
Commandline Options
-------------------
**Core Toil Options**
Options to specify the location of the Toil workflow and turn on stats collation
about the performance of jobs.
--workDir WORKDIR Absolute path to directory where temporary files
generated during the Toil run should be placed.
Standard output and error from batch system jobs
(unless ``--noStdOutErr`` is set) will be placed in
this directory. A cache directory may be placed in this
directory. Temp files and folders will be placed in a
directory ``toil-<workflowID>`` within workDir. The
workflowID is generated by Toil and will be reported in
the workflow logs. Default is determined by the
variables (TMPDIR, TEMP, TMP) via mkdtemp. For CWL,
the temporary output directory is used instead
(see CWL option ``--tmp-outdir-prefix``). This
directory needs to exist on all machines running jobs;
if capturing standard output and error from batch
system jobs is desired, it will generally need to be on
a shared file system. When sharing a cache between
containers on a host, this directory must be shared
between the containers.
--coordinationDir COORDINATION_DIR
Absolute path to directory where Toil will keep state
and lock files. When sharing a cache between containers
on a host, this directory must be shared between the
containers.
--noStdOutErr Do not capture standard output and error from batch system jobs.
--stats Records statistics about the toil workflow to be used
by 'toil stats'.
--clean=STATE
Determines the deletion of the jobStore upon
completion of the program. Choices: 'always',
'onError','never', or 'onSuccess'. The ``--stats`` option
requires information from the jobStore upon completion
so the jobStore will never be deleted with that flag.
If you wish to be able to restart the run, choose
'never' or 'onSuccess'. Default is 'never' if stats is
enabled, and 'onSuccess' otherwise
--cleanWorkDir STATE
Determines deletion of temporary worker directory upon
completion of a job. Choices: 'always', 'onError', 'never',
or 'onSuccess'. Default = always. WARNING: This option
should be changed for debugging only. Running a full
pipeline with this option could fill your disk with
intermediate data.
--clusterStats FILEPATH
If enabled, writes out JSON resource usage statistics
to a file. The default location for this file is the
current working directory, but an absolute path can
also be passed to specify where this file should be
written. This option only applies when using scalable
batch systems.
--restart If ``--restart`` is specified then will attempt to restart
existing workflow at the location pointed to by the
``--jobStore`` option. Will raise an exception if the
workflow does not exist.
**Logging Options**
Toil hides stdout and stderr by default except in case of job failure. Log
levels in toil are based on priority from the logging module:
--logOff
Only CRITICAL log messages are shown.
Equivalent to ``--logLevel=OFF`` or ``--logLevel=CRITICAL``.
--logCritical
Only CRITICAL log messages are shown.
Equivalent to ``--logLevel=OFF`` or ``--logLevel=CRITICAL``.
--logError
Only ERROR, and CRITICAL log messages are shown.
Equivalent to ``--logLevel=ERROR``.
--logWarning
Only WARN, ERROR, and CRITICAL log messages are shown.
Equivalent to ``--logLevel=WARNING``.
--logInfo
All non-debugging-related log messages are shown.
Equivalent to ``--logLevel=INFO``.
--logDebug
Log messages at DEBUG level and above are shown.
Equivalent to ``--logLevel=DEBUG``.
--logTrace
Log messages at TRACE level and above are shown.
Equivalent to ``--logLevel=TRACE``.
--logLevel=LOGLEVEL
May be set to: ``OFF`` (or ``CRITICAL``),
``ERROR``, ``WARN`` (or ``WARNING``), ``INFO``, ``DEBUG``,
or ``TRACE``.
--logFile FILEPATH
Specifies a file path to write the logging output to.
--rotatingLogging
Turn on rotating logging, which prevents log files from
getting too big (set using ``--maxLogFileSize BYTESIZE``).
--maxLogFileSize BYTESIZE
The maximum size of a job log file to keep (in bytes),
log files larger than this will be truncated to the last
X bytes. Setting this option to zero will prevent any
truncation. Setting this option to a negative value will
truncate from the beginning. Default=100MiB
Sets the maximum log file size in bytes (``--rotatingLogging`` must be active).
--log-dir DIRPATH
For CWL and local file system only. Log stdout and stderr (if tool requests stdout/stderr) to the DIRPATH.
--logColors BOOL
Enable or disable colored logging. Default=True.
**Batch System Options**
--batchSystem BATCHSYSTEM
The type of batch system to run the job(s) with. Default = single_machine.
--disableAutoDeployment
Should auto-deployment of Toil Python workflows be
deactivated? If True, the workflow's Python code should
be present at the same location on all workers. Default = False.
--maxJobs MAXJOBS
Specifies the maximum number of jobs to submit to the
backing scheduler at once. Not supported on Mesos or
AWS Batch. Use 0 for unlimited. Defaults to unlimited.
--maxLocalJobs MAXLOCALJOBS
Specifies the maximum number of housekeeping jobs to
run simultaneously on the local system. Use 0 for
unlimited. Defaults to the number of local cores.
--manualMemArgs Do not add the default arguments: 'hv=MEMORY' &
'h_vmem=MEMORY' to the qsub call, and instead rely on
TOIL_GRIDGENGINE_ARGS to supply alternative arguments.
Requires that TOIL_GRIDGENGINE_ARGS be set.
--memoryIsProduct
If the batch system understands requested memory as a product of the requested
memory and the number of cores, set this flag to properly allocate memory. This
can be fairly common with grid engine clusters (Ex: SGE, PBS, Torque).
--runCwlInternalJobsOnWorkers
Whether to run CWL internal jobs (e.g. CWLScatter) on
the worker nodes instead of the primary node. If false
(default), then all such jobs are run on the primary node.
Setting this to true can speed up the pipeline for very
large workflows with many sub-workflows and/or scatters,
provided that the worker pool is large enough.
--statePollingWait STATEPOLLINGWAIT
Time, in seconds, to wait before doing a scheduler
query for job state. Return cached results if within
the waiting period. Only works for grid engine batch
systems such as gridengine, htcondor, torque, slurm,
and lsf.
--statePollingTimeout STATEPOLLINGTIMEOUT
Time, in seconds, to retry against a broken scheduler.
Only works for grid engine batch systems such as
gridengine, htcondor, torque, slurm, and lsf.
--batchLogsDir BATCHLOGSDIR
Directory to tell the backing batch system to log into.
Should be available on both the leader and the workers,
if the backing batch system writes logs to the worker
machines' filesystems, as many HPC schedulers do. If
unset, the Toil work directory will be used. Only
works for grid engine batch systems such as gridengine,
htcondor, torque, slurm, and lsf.
--mesosEndpoint MESOSENDPOINT
The host and port of the Mesos server separated by a
colon. (default: <leader IP>:5050)
--mesosFrameworkId MESOSFRAMEWORKID
Use a specific Mesos framework ID.
--mesosRole MESOSROLE
Use a Mesos role.
--mesosName MESOSNAME
The Mesos name to use. (default: toil)
--scale SCALE A scaling factor to change the value of all submitted
tasks' submitted cores. Used in single_machine batch
system. Useful for running workflows on smaller
machines than they were designed for, by setting a
value less than 1. (default: 1)
--slurmAllocateMem SLURM_ALLOCATE_MEM
If False, do not use --mem. Used as a workaround for
Slurm clusters that reject jobs with memory
allocations.
--slurmTime SLURM_TIME
Slurm job time limit, in [DD-]HH:MM:SS format.
--slurmPartition SLURM_PARTITION
Partition to send Slurm jobs to.
--slurmGPUPartition SLURM_GPU_PARTITION
Partition to send Slurm jobs to if they ask for GPUs.
--slurmPE SLURM_PE Special partition to send Slurm jobs to if they ask
for more than 1 CPU. Useful for Slurm clusters that do
not offer a partition accepting both single-core and
multi-core jobs.
--slurmArgs SLURM_ARGS
Extra arguments to pass to Slurm.
--kubernetesHostPath KUBERNETES_HOST_PATH
Path on Kubernetes hosts to use as shared inter-pod temp
directory.
--kubernetesOwner KUBERNETES_OWNER
Username to mark Kubernetes jobs with.
--kubernetesServiceAccount KUBERNETES_SERVICE_ACCOUNT
Service account to run jobs as.
--kubernetesPodTimeout KUBERNETES_POD_TIMEOUT
Seconds to wait for a scheduled Kubernetes pod to
start running. (default: 120s)
--kubernetesPrivileged BOOL
Whether to allow Kubernetes pods to run as privileged. This can be
used to enable FUSE mounts for faster runtimes with Singularity.
When launching Toil-managed clusters, this will be set to true by --allowFuse.
(default: False)
--kubernetesPodSecurityContext KUBERNETES_POD_SECURITY_CONTEXT
Path to a YAML defining a pod security context to apply to all pods.
--kubernetesSecurityContext KUBERNETES_SECURITY_CONTEXT
Path to a YAML defining a security context to apply to all containers.
--awsBatchRegion AWS_BATCH_REGION
The AWS region containing the AWS Batch queue to submit
to.
--awsBatchQueue AWS_BATCH_QUEUE
The name or ARN of the AWS Batch queue to submit to.
--awsBatchJobRoleArn AWS_BATCH_JOB_ROLE_ARN
The ARN of an IAM role to run AWS Batch jobs as, so they
can e.g. access a job store. Must be assumable by
ecs-tasks.amazonaws.com
**Data Storage Options**
Allows configuring Toil's data storage.
--symlinkImports BOOL
When using a filesystem based job store, CWL input files
are by default symlinked in. Setting this option to True
instead copies the files into the job store, which may
protect them from being modified externally. When set
to False and as long as caching is enabled, Toil will
protect the file automatically by changing the permissions
to read-only. (Default=True)
--moveOutputs BOOL
When using a filesystem based job store, output files
are by default moved to the output directory, and a
symlink to the moved exported file is created at the
initial location. Setting this option to True instead
copies the files into the output directory. Applies to
filesystem-based job stores only. (Default=False)
--caching BOOL
Enable or disable worker level file caching. Set to "true" if
caching is desired. By default, caching is enabled on supported
batch systems. Does not affect CWL or WDL task caching.
--symlinkJobStoreReads BOOL
Allow reads and container mounts from a JobStore's
shared filesystem directly via symlink. Can be turned
off if the shared filesystem can't support the IO load
of all the jobs reading from it at once, and you want
to use ``--caching=True`` to make jobs on each node
read from node-local cache storage. (Default=True)
**Autoscaling Options**
Allows the specification of the minimum and maximum number of nodes in an
autoscaled cluster, as well as parameters to control the level of provisioning.
--provisioner CLOUDPROVIDER
The provisioner for cluster auto-scaling. This is the
main Toil ``--provisioner`` option, and defaults to None
for running on single_machine and non-auto-scaling batch
systems. The currently supported choices are 'aws' or
'gce'.
--nodeTypes NODETYPES
Specifies a list of comma-separated node types, each of which is
composed of slash-separated instance types, and an optional spot
bid set off by a colon, making the node type preemptible. Instance
types may appear in multiple node types, and the same node type
may appear as both preemptible and non-preemptible.
Valid argument specifying two node types:
c5.4xlarge/c5a.4xlarge:0.42,t2.large
Node types:
c5.4xlarge/c5a.4xlarge:0.42 and t2.large
Instance types:
c5.4xlarge, c5a.4xlarge, and t2.large
Semantics:
Bid $0.42/hour for either c5.4xlarge or c5a.4xlarge instances,
treated interchangeably, while they are available at that price,
and buy t2.large instances at full price
--minNodes MINNODES Minimum number of nodes of each type in the cluster,
if using auto-scaling. This should be provided as a
comma-separated list of the same length as the list of
node types. default=0
--maxNodes MAXNODES Maximum number of nodes of each type in the cluster,
if using autoscaling, provided as a comma-separated
list. The first value is used as a default if the list
length is less than the number of nodeTypes.
default=10
--targetTime TARGETTIME
Sets how rapidly you aim to complete jobs in seconds.
Shorter times mean more aggressive parallelization.
The autoscaler attempts to scale up/down so that it
expects all queued jobs will complete within targetTime
seconds. (Default: 1800)
--betaInertia BETAINERTIA
A smoothing parameter to prevent unnecessary
oscillations in the number of provisioned nodes. This
controls an exponentially weighted moving average of the
estimated number of nodes. A value of 0.0 disables any
smoothing, and a value of 0.9 will smooth so much that
few changes will ever be made. Must be between 0.0 and
0.9. (Default: 0.1)
--scaleInterval SCALEINTERVAL
The interval (seconds) between assessing if the scale of
the cluster needs to change. (Default: 60)
--preemptibleCompensation PREEMPTIBLECOMPENSATION
The preference of the autoscaler to replace
preemptible nodes with non-preemptible nodes, when
preemptible nodes cannot be started for some reason.
Defaults to 0.0. This value must be between 0.0 and
1.0, inclusive. A value of 0.0 disables such
compensation, a value of 0.5 compensates two missing
preemptible nodes with a non-preemptible one. A value
of 1.0 replaces every missing pre-emptable node with a
non-preemptible one.
--nodeStorage NODESTORAGE
Specify the size of the root volume of worker nodes
when they are launched in gigabytes. You may want to
set this if your jobs require a lot of disk space. The
default value is 50.
--nodeStorageOverrides NODESTORAGEOVERRIDES
Comma-separated list of nodeType:nodeStorage that are used
to override the default value from ``--nodeStorage`` for the
specified nodeType(s). This is useful for heterogeneous
jobs where some tasks require much more disk than others.
--metrics Enable the prometheus/grafana dashboard for monitoring
CPU/RAM usage, queue size, and issued jobs.
--assumeZeroOverhead Ignore scheduler and OS overhead and assume jobs can use every
last byte of memory and disk on a node when autoscaling.
**Service Options**
Allows the specification of the maximum number of service jobs in a cluster. By
keeping this limited we can avoid nodes occupied with services causing deadlocks.
(Not for CWL).
--maxServiceJobs MAXSERVICEJOBS
The maximum number of service jobs that can be run
concurrently, excluding service jobs running on
preemptible nodes. default=9223372036854775807
--maxPreemptibleServiceJobs MAXPREEMPTIBLESERVICEJOBS
The maximum number of service jobs that can run
concurrently on preemptible nodes.
default=9223372036854775807
--deadlockWait DEADLOCKWAIT
Time, in seconds, to tolerate the workflow running only
the same service jobs, with no jobs to use them, before
declaring the workflow to be deadlocked and stopping.
default=60
--deadlockCheckInterval DEADLOCKCHECKINTERVAL
Time, in seconds, to wait between checks to see if the
workflow is stuck running only service jobs, with no
jobs to use them. Should be shorter than
``--deadlockWait``. May need to be increased if the batch
system cannot enumerate running jobs quickly enough, or
if polling for running jobs is placing an unacceptable
load on a shared cluster. default=30
**Resource Options**
The options to specify default cores/memory requirements (if not specified by
the jobs themselves), and to limit the total amount of memory/cores requested
from the batch system.
--defaultMemory INT The default amount of memory to request for a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Standard suffixes
like K, Ki, M, Mi, G or Gi are supported. Default is
2.0Gi
--defaultCores FLOAT The default number of CPU cores to dedicate a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Fractions of a
core (for example 0.1) are supported on some batch
systems, namely Mesos and single_machine. Default is
1.0
--defaultDisk INT The default amount of disk space to dedicate a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Standard suffixes
like K, Ki, M, Mi, G or Gi are supported. Default is
2.0Gi
--defaultAccelerators ACCELERATOR
The default amount of accelerators to request for a
job. Only applicable to jobs that do not specify an
explicit value for this requirement. Each accelerator
specification can have a type (gpu [default], nvidia,
amd, cuda, rocm, opencl, or a specific model like
nvidia-tesla-k80), and a count [default: 1]. If both a
type and a count are used, they must be separated by a
colon. If multiple types of accelerators are used, the
specifications are separated by commas. Default is [].
--defaultPreemptible BOOL
Make all jobs able to run on preemptible (spot) nodes
by default.
--maxCores INT The maximum number of CPU cores to request from the
batch system at any one time. Standard suffixes like
K, Ki, M, Mi, G or Gi are supported.
--maxMemory INT The maximum amount of memory to request from the batch
system at any one time. Standard suffixes like K, Ki,
M, Mi, G or Gi are supported.
--maxDisk INT The maximum amount of disk space to request from the
batch system at any one time. Standard suffixes like
K, Ki, M, Mi, G or Gi are supported.
**Options for rescuing/killing/restarting jobs.**
The options for jobs that either run too long/fail or get lost (some batch
systems have issues!).
--retryCount INT
Number of times to retry a failing job before giving
up and labeling job failed. default=1
--stopOnFirstFailure BOOL
Stop the workflow at the first complete job failure.
--enableUnlimitedPreemptibleRetries
If set, preemptible failures (or any failure due to an
instance getting unexpectedly terminated) will not count
towards job failures and ``--retryCount``.
--doubleMem If set, batch jobs which die due to reaching memory
limit on batch schedulers will have their memory
doubled and they will be retried. The remaining
retry count will be reduced by 1. Currently only
supported by LSF. default=False.
--maxJobDuration INT
Maximum runtime of a job (in seconds) before we kill
it (this is a lower bound, and the actual time before
killing the job may be longer).
--rescueJobsFrequency INT
Period of time to wait (in seconds) between checking
for missing/overlong jobs, that is jobs which get lost
by the batch system. Expert parameter.
--jobStoreTimeout FLOAT
Maximum time (in seconds) to wait for a job's update to
the job store before declaring it failed.
**Log Management Options**
--maxLogFileSize MAXLOGFILESIZE
The maximum size of a job log file to keep (in bytes),
log files larger than this will be truncated to the
last X bytes. Setting this option to zero will prevent
any truncation. Setting this option to a negative
value will truncate from the beginning. Default=62.5 K
--writeLogs FILEPATH
Write worker logs received by the leader into their
own files at the specified path. Any non-empty standard
output and error from failed batch system jobs will also
be written into files at this path. The current working
directory will be used if a path is not specified
explicitly. Note: By default only the logs of failed
jobs are returned to leader. Set log level to 'debug' or
enable ``--writeLogsFromAllJobs`` to get logs back from
successful jobs, and adjust ``--maxLogFileSize`` to
control the truncation limit for worker logs.
--writeLogsGzip FILEPATH
Identical to ``--writeLogs`` except the logs files are
gzipped on the leader.
--writeMessages FILEPATH
File to send messages from the leader's message bus to.
--realTimeLogging BOOL
Enable real-time logging from workers to leader.
**Miscellaneous Options**
--disableChaining BOOL
Disables chaining of jobs (chaining uses one job's
resource allocation for its successor job if
possible).
--disableJobStoreChecksumVerification
Disables checksum verification for files transferred
to/from the job store. Checksum verification is a safety
check to ensure the data is not corrupted during transfer.
Currently only supported for non-streaming AWS files
--sseKey SSEKEY Path to file containing 32 character key to be used
for server-side encryption on awsJobStore or
googleJobStore. SSE will not be used if this flag is
not passed.
--setEnv NAME, -e NAME
NAME=VALUE or NAME, -e NAME=VALUE or NAME are also valid.
Set an environment variable early on in the worker. If
VALUE is omitted, it will be looked up in the current
environment. Independently of this option, the worker
will try to emulate the leader's environment before
running a job, except for some variables known to vary
across systems. Using this option, a variable can be
injected into the worker process itself before it is
started.
--servicePollingInterval SERVICEPOLLINGINTERVAL
Interval of time service jobs wait between polling for
the existence of the keep-alive flag (default=60)
--forceDockerAppliance
Disables sanity checking the existence of the docker
image specified by TOIL_APPLIANCE_SELF, which Toil uses
to provision mesos for autoscaling.
--statusWait INT Seconds to wait between reports of running jobs.
(default=3600)
--disableProgress Disables the progress bar shown when standard error is
a terminal.
--publishWorkflowMetrics PUBLISHWORKFLOWMETRICS
Whether to publish workflow metrics reports (including
unique workflow and task run IDs, job names, and
version and Toil feature use information) to Dockstore
when a workflow completes. Selecting "current" will
publish metrics for the current workflow. Selecting
"all" will also publish prior workflow runs from the
Toil history database, even if they themselves were run
with "no". Note that once published, workflow metrics
CANNOT be deleted or un-published; they will stay
published forever!
**Debug Options**
Debug options for finding problems or helping with testing.
--debugWorker Experimental no forking mode for local debugging.
Specifically, workers are not forked and stderr/stdout
are not redirected to the log. (default=False)
--disableWorkerOutputCapture
Let worker output go to worker's standard out/error
instead of per-job logs.
--badWorker BADWORKER
For testing purposes randomly kill ``--badWorker``
proportion of jobs using SIGKILL. (Default: 0.0)
--badWorkerFailInterval BADWORKERFAILINTERVAL
When killing the job pick uniformly within the interval
from 0.0 to ``--badWorkerFailInterval`` seconds after the
worker starts. (Default: 0.01)
--kill_polling_interval KILL_POLLING_INTERVAL
Interval of time (in seconds) the leader waits between
polling for the kill flag inside the job store set by
the "toil kill" command. (default=5)
Restart Option
--------------
In the event of failure, Toil can resume the pipeline by adding the argument
``--restart`` and rerunning the workflow. Toil Python workflows (but not CWL or WDL
workflows) can even be edited and resumed, which is useful for development or
troubleshooting.
Running Workflows with Services
-------------------------------
Toil supports jobs, or clusters of jobs, that run as *services* to other
*accessor* jobs. Example services include server databases or Apache Spark
Clusters. As service jobs exist to provide services to accessor jobs their
runtime is dependent on the concurrent running of their accessor jobs. The dependencies
between services and their accessor jobs can create potential deadlock scenarios,
where the running of the workflow hangs because only service jobs are being
run and their accessor jobs can not be scheduled because of too limited resources
to run both simultaneously. To cope with this situation Toil attempts to
schedule services and accessors intelligently, however to avoid a deadlock
with workflows running service jobs it is advisable to use the following parameters:
* ``--maxServiceJobs``: The maximum number of service jobs that can be run concurrently, excluding service jobs running on preemptible nodes.
* ``--maxPreemptibleServiceJobs``: The maximum number of service jobs that can run concurrently on preemptible nodes.
Specifying these parameters so that at a maximum cluster size there will be
sufficient resources to run accessors in addition to services will ensure that
such a deadlock can not occur.
If too low a limit is specified then a deadlock can occur in which toil can
not schedule sufficient service jobs concurrently to complete the workflow.
Toil will detect this situation if it occurs and throw a
:class:`toil.DeadlockException` exception. Increasing the cluster size
and these limits will resolve the issue.
Setting Options directly in a Python Workflow
---------------------------------------------
It's good to remember that commandline options can be overridden in the code of a Python workflow. For example,
:func:`toil.job.Job.Runner.getDefaultOptions` can be used to get the default Toil options, ignoring what was passed on the command line. In this example,
this is used to ignore command-line options and always run with the "./toilWorkflow" directory as the jobstore:
.. code-block:: python
options = Job.Runner.getDefaultOptions("./toilWorkflow") # Get the options object
with Toil(options) as toil:
toil.start(Job()) # Run the root job
However, each option can be explicitly set within the workflow by modifying the options object. In this example, we are setting
``logLevel = "DEBUG"`` (all log statements are shown) and ``clean="ALWAYS"`` (always delete the jobstore) like so:
.. code-block:: python
options = Job.Runner.getDefaultOptions("./toilWorkflow") # Get the options object
options.logLevel = "DEBUG" # Set the log level to the debug level.
options.clean = "ALWAYS" # Always delete the jobStore after a run
with Toil(options) as toil:
toil.start(Job()) # Run the root job
However, the usual incantation is to accept commandline args from the user with the following:
.. code-block:: python
parser = Job.Runner.getDefaultArgumentParser() # Get the parser
options = parser.parse_args() # Parse user args to create the options object
with Toil(options) as toil:
toil.start(Job()) # Run the root job
We can also have code in the workflow to overwrite user supplied arguments:
.. code-block:: python
parser = Job.Runner.getDefaultArgumentParser() # Get the parser
options = parser.parse_args() # Parse user args to create the options object
options.logLevel = "DEBUG" # Set the log level to the debug level.
options.clean = "ALWAYS" # Always delete the jobStore after a run
with Toil(options) as toil:
toil.start(Job()) # Run the root job
|