File: configuration.rst

package info (click to toggle)
oar 2.6.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 9,700 kB
  • sloc: perl: 34,517; sh: 6,041; ruby: 5,840; sql: 3,390; cpp: 2,277; makefile: 402; php: 365; ansic: 335; python: 275; exp: 23
file content (355 lines) | stat: -rw-r--r-- 10,543 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
Configuration file
==================

Be careful, the syntax of this file must be bash compliant(so after editing
you must be able to launch in bash 'source /etc/oar.conf' and have variables
assigned).
Each configuration tag found in /etc/oar.conf is now described:

  - Database type : you can use a MySQL or a PostgreSQL database (tags are
    "mysql" or "Pg")::

      DB_TYPE=Pg

  - Database hostname::

      DB_HOSTNAME=127.0.0.1

	- Database port::

      DB_PORT=5432

  - Database base name::

      DB_BASE_NAME=oar

  - DataBase user name::

      DB_BASE_LOGIN=oar

  - DataBase user password::

      DB_BASE_PASSWD=oar

.. _DB_BASE_LOGIN_RO:

  - DataBase read only user name::

      DB_BASE_LOGIN_RO=oar_ro

.. _DB_BASE_PASSWD_RO:

  - DataBase read only user password::

      DB_BASE_PASSWD_RO=oar_ro


  - OAR server hostname::

      SERVER_HOSTNAME=localhost

.. _SERVER_PORT:

  - OAR server port::

      SERVER_PORT=6666

  - When the user does not specify a -l option then oar use this::

      OARSUB_DEFAULT_RESOURCES="/resource_id=1"

  - Force use of job key even if --use-job-key or -k is not set in oarsub::

      OARSUB_FORCE_JOB_KEY="no"

.. _DEPLOY_HOSTNAME:

  - Specify where we are connected in the deploy queue(the node to connect
    to when the job is in the deploy queue)::

      DEPLOY_HOSTNAME="127.0.0.1"

.. _COSYSTEM_HOSTNAME:

  - Specify where we are connected with a job of the cosystem type::

      COSYSTEM_HOSTNAME="127.0.0.1"

.. _DETACH_JOB_FROM_SERVER:

  - Set the directory where OAR will store its temporary files on each nodes
    of the cluster. This value MUST be the same in all oar.conf on
    all nodes::

      OAR_RUNTIME_DIRECTORY="/tmp/oar_runtime"

  - Specify the database field to use to fill the file on the first node of
    the job in $OAR_NODE_FILE (default is 'network_address'). Only resources
    with type=default are displayed in this file::

      NODE_FILE_DB_FIELD="network_address"

  - Specify the database field that will be considered to fill the node file
    used by the user on the first node of the job. for each different value
    of this field then OAR will put 1 line in the node file(by default "cpu")::

      NODE_FILE_DB_FIELD_DISTINCT_VALUES="core"

  - By default OAR uses the ping command to detect if nodes are down or not.
    To enhance this diagnostic you can specify one of these other methods (
    give the complete command path):

      * OAR taktuk::

          PINGCHECKER_TAKTUK_ARG_COMMAND="-t 30 broadcast exec [ true ]"

        If you use sentinelle.pl then you must use this tag::

          PINGCHECKER_SENTINELLE_SCRIPT_COMMAND="/var/lib/oar/sentinelle.pl -t 30 -w 20"

      * OAR fping::

          PINGCHECKER_FPING_COMMAND="/usr/bin/fping -q"

      * OAR nmap : it will test to connect on the ssh port (22)::

          PINGCHECKER_NMAP_COMMAND="/usr/bin/nmap -p 22 -n -T5"

      * OAR generic : a specific script may be used instead of ping to check
        aliveness of nodes. The script must return bad nodes on STDERR (1 line
        for a bad node and it must have exactly the same name that OAR has
        given in argument of the command)::

          PINGCHECKER_GENERIC_COMMAND="/path/to/command arg1 arg2"

  - OAR log level: 3(debug+warnings+errors), 2(warnings+errors), 1(errors)::

      LOG_LEVEL=2

  - OAR log file::

      LOG_FILE="/var/log/oar.log"

  - If you want to debug oarexec on nodes then affect 1 (only effective if
    DETACH_JOB_FROM_SERVER = 1)::

      OAREXEC_DEBUG_MODE=0

.. _ACCOUNTING_WINDOW:

  - Set the granularity of the OAR accounting feature (in seconds). Default is
    1 day (86400s)::

      ACCOUNTING_WINDOW="86400"

.. _MAIL:

  - OAR informations may be notified by email to the administror.
    Set accordingly to your configuration the next lines to activate
    this feature::

      MAIL_SMTP_SERVER="smtp.serveur.com"
      MAIL_RECIPIENT="user@domain.com"
      MAIL_SENDER="oar@domain.com"

  - Set the timeout for the prologue and epilogue execution on computing
    nodes::

      PROLOGUE_EPILOGUE_TIMEOUT=60

  - Files to execute before and after each job on the first computing node
    (by default nothing is executed)::

      PROLOGUE_EXEC_FILE="/path/to/prog"
      EPILOGUE_EXEC_FILE="/path/to/prog"

  - Set the timeout for the prologue and epilogue execution on the OAR server::

      SERVER_PROLOGUE_EPILOGUE_TIMEOUT=60

.. _SERVER_SCRIPT_EXEC_FILE:

  - Files to execute before and after each job on the OAR server
    (by default nothing is executed)::

      SERVER_PROLOGUE_EXEC_FILE="/path/to/prog"
      SERVER_EPILOGUE_EXEC_FILE="/path/to/prog"

  - Set the frequency for checking Alive and Suspected resources::

      FINAUD_FREQUENCY=300

.. _DEAD_SWITCH_TIME:

  - Set time after which resources become Dead (default is 0 and it means
    never)::

      DEAD_SWITCH_TIME=600

.. _SCHEDULER_TIMEOUT:

  - Maximum of seconds used by a scheduler::

      SCHEDULER_TIMEOUT=20

  - Time to wait when a reservation has not got all resources that it has
    reserved (some resources could have become Suspected or Absent since the
    job submission) before to launch the job in the remaining resources::

      RESERVATION_WAITING_RESOURCES_TIMEOUT=300

.. _SCHEDULER_JOB_SECURITY_TIME:

  - Time to add between each jobs (time for administration tasks or time to
    let computers to reboot)::

      SCHEDULER_JOB_SECURITY_TIME=1

.. _SCHEDULER_GANTT_HOLE_MINIMUM_TIME:

  - Minimum time in seconds that can be considered like a hole where a job
    could be scheduled in::

      SCHEDULER_GANTT_HOLE_MINIMUM_TIME=300

.. _SCHEDULER_RESOURCE_ORDER:

  - You can add an order preference on resource assigned by the system(SQL
    ORDER syntax)::

      SCHEDULER_RESOURCE_ORDER="switch ASC, network_address DESC, resource_id ASC"

.. _SCHEDULER_RESOURCES_ALWAYS_ASSIGNED_TYPE:

  - You can specify resources from a resource type that will be always assigned for
    each job (for example: enable all jobs to be able to log on the cluster
    frontales).
    For more information, see the FAQ::

      SCHEDULER_RESOURCES_ALWAYS_ASSIGNED_TYPE="42 54 12 34"

  - This says to the scheduler to treate resources of these types, where there is
    a suspended job, like free ones. So some other jobs can be scheduled on these
    resources. (list resource types separate with spaces; Default value is
    nothing so no other job can be scheduled on suspended job resources)::

      SCHEDULER_AVAILABLE_SUSPENDED_RESOURCE_TYPE="default licence vlan"

  - Name of the perl script that manages suspend/resume. You have to install your
    script in $OARDIR and give only the name of the file without the entire path.
    (default is suspend_resume_manager.pl)::

      SUSPEND_RESUME_FILE="suspend_resume_manager.pl"

.. _JUST_AFTER_SUSPEND_EXEC_FILE:
.. _JUST_BEFORE_RESUME_EXEC_FILE:

  - Files to execute just after a job was suspended and just before a job was
    resumed::

      JUST_AFTER_SUSPEND_EXEC_FILE="/path/to/prog"
      JUST_BEFORE_RESUME_EXEC_FILE="/path/to/prog"

  - Timeout for the two previous scripts::

      SUSPEND_RESUME_SCRIPT_TIMEOUT=60

.. _JOB_RESOURCE_MANAGER_PROPERTY_DB_FIELD:

  - Indicate the name of the database field that contains the cpu number of
    the node. If this option is set then users must use oarsh instead of
    ssh to walk on each nodes that they have reserved via oarsub.
    ::

      JOB_RESOURCE_MANAGER_PROPERTY_DB_FIELD=cpuset

.. _JOB_RESOURCE_MANAGER_FILE:

  - Name of the perl script that manages cpuset. You have to install your
    script in $OARDIR and give only the name of the file without the
    entire path.
    (default is cpuset_manager.pl which handles the linux kernel cpuset)
    ::

      JOB_RESOURCE_MANAGER_FILE="cpuset_manager.pl"

.. _JOB_RESOURCE_MANAGER_JOB_UID_TYPE:

  - Resource "type" DB field to use if you want to enable the job uid feature.
    (create a unique user id per job on each nodes of the job)
    ::

      JOB_RESOURCE_MANAGER_JOB_UID_TYPE="userid"

.. _TAKTUK_CMD:

  - If you have installed taktuk and want to use it to manage cpusets
    then give the full command path (with your options except "-m" and "-o"
    and "-c").
    You don't also have to give any taktuk command.(taktuk version must be >=
    3.6)
    ::

      TAKTUK_CMD="/usr/bin/taktuk -s"

  - If you want to manage nodes to be started and stoped. OAR gives you this
    API:

.. _SCHEDULER_NODE_MANAGER_WAKE_UP_CMD:

    * When OAR scheduler wants some nodes to wake up then it launches this
      command and puts on its STDIN the list of nodes to wake up (one hostname
      by line).The scheduler looks at *available_upto* field in the
      :ref:`database-resources-anchor`
      table to know if the node will be started for enough time::

        SCHEDULER_NODE_MANAGER_WAKE_UP_CMD="/path/to/the/command with your args"

.. _SCHEDULER_NODE_MANAGER_SLEEP_CMD:

    * When OAR considers that some nodes can be shut down, it launches this
      command and puts the node list on its STDIN(one hostname by line)::

        SCHEDULER_NODE_MANAGER_SLEEP_CMD="/path/to/the/command args"

.. _SCHEDULER_NODE_MANAGER_IDLE_TIME:

      + Parameters for the scheduler to decide when a node is idle(number of
        seconds since the last job was terminated on the nodes)::

          SCHEDULER_NODE_MANAGER_IDLE_TIME=600

.. _SCHEDULER_NODE_MANAGER_SLEEP_TIME:

      + Parameters for the scheduler to decide if a node will have enough time
        to sleep(number of seconds before the next job)::

          SCHEDULER_NODE_MANAGER_SLEEP_TIME=600

.. _OPENSSH_CMD:

  - Command to use to connect to other nodes (default is "ssh" in the PATH)
    ::

      OPENSSH_CMD="/usr/bin/ssh"

  - These are configuration tags for OAR in the desktop-computing mode::

      DESKTOP_COMPUTING_ALLOW_CREATE_NODE=0
      DESKTOP_COMPUTING_EXPIRY=10
      STAGEOUT_DIR="/var/lib/oar/stageouts/"
      STAGEIN_DIR="/var/lib/oar/stageins"
      STAGEIN_CACHE_EXPIRY=144

  - This variable must be set to enable the use of oarsh from a frontale node.
    Otherwise you must not set this variable if you are not on a frontale::

      OARSH_OARSTAT_CMD="/usr/bin/oarstat"

.. _OARSH_OPENSSH_DEFAULT_OPTIONS:

  - The following variable adds options to ssh. If one option is not handled
    by your ssh version just remove it BUT be careful because these options are
    there for security reasons::

      OARSH_OPENSSH_DEFAULT_OPTIONS="-oProxyCommand=none -oPermitLocalCommand=no"