File: introduction-cloud-computing.rst

package info (click to toggle)
condor 23.9.6%2Bdfsg-2.1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 60,012 kB
  • sloc: cpp: 528,272; perl: 87,066; python: 42,650; ansic: 29,558; sh: 11,271; javascript: 3,479; ada: 2,319; java: 619; makefile: 615; xml: 613; awk: 268; yacc: 78; fortran: 54; csh: 24
file content (75 lines) | stat: -rw-r--r-- 3,653 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Introduction
============

To be clear, our concern throughout this chapter is with commercial
services which rent computational resources over the Internet at short
notice and charge in small increments (by the minute or the hour).
Currently, the :tool:`condor_annex` tool supports only AWS.  AWS can start booting
a new virtual machine as quickly as a few seconds after the request;
barring hardware failure, you will be able to continue renting that VM
until you stop paying the hourly charge.  The other cloud services are
broadly similar.

If you already have access to the Grid, you may wonder why you would
want to begin cloud computing.  The cloud services offer two major
advantages over the Grid: first, cloud resources are typically available
more quickly and in greater quantity than from the Grid; and second,
because cloud resources are virtual machines, they are considerably more
customizable than Grid resources.  The major disadvantages are, of
course, cost and complexity (although we hope that :tool:`condor_annex`
reduces the latter).

We illustrate these advantages with what we anticipate will be the most
common uses for :tool:`condor_annex`.

Use Case: Deadlines
-------------------

With the ability to acquire computational resources in seconds or
minutes and retain them for days or weeks, it becomes possible to
rapidly adjust the size - and cost - of an HTCondor pool. Giving this
ability to the end-user avoids the problems of deciding who will pay for
expanding the pool and when to do so. We anticipate that the usual cause
for doing so will be deadlines; the end-user has the best knowledge of
their own deadlines and how much, in monetary terms, it's worth to
complete their work by that deadline.

Use Case: Capabilities
----------------------

Cloud services may offer (virtual) hardware in configurations
unavailable in the local pool, or in quantities that it would be
prohibitively expensive to provide on an on-going basis. Examples (from
2017) may include GPU-based computation, or computations requiring a
terabyte of main memory. A cloud service may also offer fast and
cloud-local storage for shared data, which may have substantial
performance benefits for some workflows. Some cloud providers (for
example, AWS) have pre-populated this storage with common public
datasets, to further ease adoption.

By using cloud resources, an HTCondor pool administrator may also
experiment with or temporarily offer different software and
configurations. For example, a pool may be configured with a maximum job
runtime, perhaps to reduce the latency of fair-share adjustments or to
protect against hung jobs. Adding cloud resources which permit
longer-running jobs may be the least-disruptive way to accommodate a user
whose jobs need more time.

Use Case: Capacities
--------------------

It may be possible for an HTCondor administrator to lower the cost of
their pool by increasing utilization and meeting peak demand with cloud
computing.

Use Case: Experimental Convenience
----------------------------------

Although you can experiment with many different HTCondor configurations using
:tool:`condor_annex` and HTCondor running as a normal user, some configurations may
require elevated privileges.  In other situations, you may not be to create
an unprivileged HTCondor pool on a machine because that would violate the
acceptable-use policies, or because you can't change the firewall, or
because you'd use too much bandwidth.  In those cases, you can instead
"seed" the cloud with a single-node HTCondor installation and expand it using
:tool:`condor_annex`.  See :ref:`condor_in_the_cloud` for instructions.