File: using_backup_servers.rst

package info (click to toggle)
ecflow 5.15.2-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 51,868 kB
  • sloc: cpp: 269,341; python: 22,756; sh: 3,609; perl: 770; xml: 333; f90: 204; ansic: 141; makefile: 70
file content (41 lines) | stat: -rw-r--r-- 2,342 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
.. _using_backup_servers:

Using backup servers
********************


The :code:`ecflow_client` can be configured to contact alternate backup servers in case the primary server is not available -- this typically most useful in Operational environments.

The use of backup servers is, by default, enabled only for :term:`Task commands <child command>`. This behaviour can be customized by setting the environment variable
:code:`ECF_HOSTFILE_POLICY`. This variable can take the following values:

- :code:`task` (default): backup servers are used only for Task commands.
- :code:`all`: backup servers are used for all commands, including Task and User commands.

The list of backup servers can be specified by defining the environment variable :code:`ECF_HOSTFILE`, indicating the location of a file, by convention located at :code:`$HOME/.ecf_hostfile`, with the following format:

.. code-block:: shell

   # This is a comment
   host1             # port 3141 is used by default, when not specified
   host2:port2
   host3:port3

To enable the :code:`ecflow_client` to read the file and use the listed backup servers, the environment variable :code:`ECF_HOSTFILE` must be set before running the :code:`ecflow_client` command:

.. code-block:: shell

  export ECF_HOSTFILE=$HOME/.ecf_hostfile

.. important::

    The maximum retry period is defined by :code:`ECF_TIMEOUT`, which by default is set to 24 hours.
    This means that the :code:`ecflow_client` will continue to loop over the list and retry primary host followed by alternate hosts for up to :code:`ECF_TIMEOUT`, before giving up and reporting a failure.

.. warning::

    When executing a command, the :code:`ecflow_client` will always first try to connect to the primary host, as defined by command line options or :code:`ECF_HOST`::code:`ECF_PORT`.
    If the first attemp to contact the primary host fails, the client will automatically retry contacting the primary server after waiting for a retry period of 10 seconds.
    Only after this second attempt has failed, will the :code:`ecflow_client` then immediatelly try to connect to the backup servers listed in the :code:`ECF_HOSTFILE`.

    This implies that the :code:`ecflow_client` will not try to connect to the backup servers immediately, and thus contacting the backup server incurs in a minimum 10 seconds delay.