File: MPIX_Comm_agree.3.rst

package info (click to toggle)
openmpi 5.0.8-10
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 201,692 kB
  • sloc: ansic: 613,078; makefile: 42,351; sh: 11,194; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,179; python: 1,859; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (175 lines) | stat: -rw-r--r-- 5,480 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
.. _mpix_comm_agree:

MPIX_Comm_agree
===============
.. include_body

:ref:`MPIX_Comm_agree`, :ref:`MPIX_Comm_iagree` - Agree on a flag value
from all live processes and distributes the result back to all live
processes, even after process failures.

This is part of the User Level Fault Mitigation :ref:`ULFM extension <ulfm-label>`.

SYNTAX
------

C Syntax
^^^^^^^^

.. code-block:: c

   #include <mpi.h>
   #include <mpi-ext.h>

   int MPIX_Comm_agree(MPI_Comm comm, int *flag)
   
   int MPIX_Comm_iagree(MPI_Comm comm, int *flag, MPI_Request *request)


Fortran Syntax
^^^^^^^^^^^^^^

.. code-block:: fortran

   USE MPI
   USE MPI_EXT
   ! or the older form: INCLUDE 'mpif.h'

   MPIX_COMM_AGREE(COMM, FLAG, IERROR)
        INTEGER COMM, FLAG, IERROR

   MPIX_COMM_IAGREE(COMM, FLAG, REQUEST, IERROR)
        INTEGER COMM, FLAG, REQUEST, IERROR


Fortran 2008 Syntax
^^^^^^^^^^^^^^^^^^^

.. code-block:: fortran

   USE mpi_f08
   USE mpi_ext_f08

   MPIX_Comm_agree(comm, flag, ierror)
        TYPE(MPI_Comm), INTENT(IN) :: comm
        INTEGER, INTENT(INOUT) :: flag
        INTEGER, OPTIONAL, INTENT(OUT) :: ierror

   MPIX_COMM_IAGREE(COMM, FLAG, REQUEST, IERROR)
        TYPE(MPI_Comm), INTENT(IN) :: comm
        INTEGER, INTENT(INOUT), ASYNCHRONOUS :: flag
        TYPE(MPI_Request), INTENT(OUT) :: request
        INTEGER, OPTIONAL, INTENT(OUT) :: ierror

INPUT PARAMETERS
----------------
* ``comm``: Communicator (handle).
* ``flag``: Binary flags (integer).

OUTPUT PARAMETERS
-----------------
* ``flag``: Reduced binary flags (integer).
* ``request``: Request (handle, non-blocking only).
* ``ierror``: Fortran only: Error status (integer).

DESCRIPTION
-----------

This collective communication agrees on the integer value *flag* and
(implicitly) on the group of failed processes in *comm*.

On completion, all non-failed MPI processes have agreed to set the
output integer value of *flag* to the result of a *bitwise AND*
operation over the contributed input values of *flag*.

:ref:`MPIX_Comm_iagree` is the non-blocking variant of :ref:`MPIX_Comm_agree`.

PROCESS FAILURES
----------------

When an MPI process fails before contributing to the agree operation,
the *flag* is computed ignoring its contribution, and the operation
raises an error of class MPIX_ERR_PROC_FAILED.

When an error of class MPIX_ERR_PROC_FAILED is raised, it is consistently
raised at all MPI processes in the group(s) of *comm*.

After :ref:`MPIX_Comm_agree` raised an error of class MPIX_ERR_PROC_FAILED,
the group produced by a subsequent call to :ref:`MPIX_Comm_get_failed` on
*comm* contains every MPI process that didn't contribute to the
computation of *flag*.

WHEN THE COMMUNICATOR CONTAINS ACKNOWLEDGED FAILURES
----------------------------------------------------

If **all** MPI processes in the group of *comm* have acknowledged the failure
of an MPI process (using :ref:`MPIX_Comm_ack_failed`) prior to the call to
:ref:`MPIX_Comm_agree` (or :ref:`MPIX_Comm_iagree`), the MPIX_ERR_PROC_FAILED
error is not raised when the output value of *flag* ignores the
contribution of that failed process. Note that this is an uniform property:
if a non-contributing process is found to be not-acknowledged at any live
process in *comm*, all processes raise an error of class MPIX_ERR_PROC_FAILED.

**Example 1:** Using a combination of :ref:`MPIX_Comm_ack_failed` and
:ref:`MPIX_Comm_agree` users can propagate and synchronize the knowledge
of failures across all MPI processes in *comm*.

.. code-block:: c

    Comm_get_failed_consistent(MPI_Comm c, MPI_Group * g) {
        int rc; int T=1;
        int size; int num_acked;
        MPI_Group gf;
        int ranges[3] = {0, 0, 1};

        MPI_Comm_size(c, &size);

        do {
            /* this routine is not pure: calling MPI_Comm_ack_failed
             * affects the state of the communicator c */
            MPIX_Comm_ack_failed(c, size, &num_acked);
            /* we simply ignore the T value in this example */
            rc = MPIX_Comm_agree(c, &T);
        } while( rc != MPI_SUCCESS );
        /* after this loop, MPIX_Comm_agree has returned MPI_SUCCESS at
         * all processes, so all processes have Acknowledged the same set of
         * failures. Let's get that set of failures in the g group. */
        if( 0 == num_acked ) {
            *g = MPI_GROUP_EMPTY;
        }
        else {
            MPIX_Comm_get_failed(c, &gf);
            ranges[1] = num_acked - 1;
            MPI_Group_range_incl(gf, 1, ranges, g);
            MPI_Group_free(&gf);
        }
    }

WHEN THE COMMUNICATOR IS REVOKED
--------------------------------

This function never raises an error of class MPIX_ERR_REVOKED.
The defined semantics of :ref:`MPIX_Comm_agree` are maintained when *comm*
is revoked, or when the group of *comm* contains failed MPI processes.
In particular, :ref:`MPIX_Comm_agree` is a collective operation, even
when *comm* is revoked.

WHEN COMMUNICATOR IS AN INTER-COMMUNICATOR
------------------------------------------

When the communicator is an inter-communicator, the value of *flag* is
a *bitwise AND* operation over the values contributed by the remote
group.

When an error of class MPIX_ERR_PROC_FAILED is raised, it is consistently
raised at all MPI processes in the group(s) of *comm*, that is, both
the local and remote groups of the inter-communicator.

ERRORS
------

.. include:: ./ERRORS.rst

.. seealso::
   * :ref:`MPIX_Comm_is_revoked`
   * :ref:`MPIX_Comm_ack_failed`