File: MPIX_Comm_ack_failed.3.rst

package info (click to toggle)
openmpi 5.0.8-10
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 201,692 kB
  • sloc: ansic: 613,078; makefile: 42,351; sh: 11,194; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,179; python: 1,859; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (128 lines) | stat: -rw-r--r-- 4,188 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
.. _mpix_comm_ack_failed:

MPIX_Comm_ack_failed
====================
.. include_body

:ref:`MPIX_Comm_get_failed` - acknowledge failed processes in a communicator.

This is part of the User Level Fault Mitigation :ref:`ULFM extension <ulfm-label>`.

SYNTAX
------

C Syntax
^^^^^^^^

.. code-block:: c

   #include <mpi.h>
   #include <mpi-ext.h>

   int MPIX_Comm_ack_failed(MPI_Comm comm, int num_to_ack, int *num_acked)

Fortran Syntax
^^^^^^^^^^^^^^

.. code-block:: fortran

   USE MPI
   USE MPI_EXT
   ! or the older form: INCLUDE 'mpif.h'

   MPIX_COMM_ACK_FAILED(COMM, NUM_TO_ACK, NUM_ACKED, IERROR)
        INTEGER COMM, NUM_TO_ACK, NUM_ACKED, IERROR

Fortran 2008 Syntax
^^^^^^^^^^^^^^^^^^^

.. code-block:: fortran

   USE mpi_f08
   USE mpi_ext_f08

   MPIX_Comm_ack_failed(comm, num_to_ack, num_acked, ierror)
        TYPE(MPI_Comm), INTENT(IN) :: comm
        INTEGER, INTENT(IN) :: num_to_ack
        INTEGER, INTENT(OUT) :: num_acked
        INTEGER, OPTIONAL, INTENT(OUT) :: ierror

INPUT PARAMETERS
----------------
* ``comm``: Communicator (handle).
* ``num_to_ack``: maximum number of process failures to acknowledge in *comm* (integer)

OUTPUT PARAMETERS
-----------------
* ``num_acked``: number of acknowledged failures in *comm* (integer).
* ``ierror``: Fortran only: Error status (integer).

DESCRIPTION
-----------

his local operation gives the users a way to **acknowledge**
locally notified failures on *comm*. The operation acknowledges the first
*num_to_ack* process failures on *comm*, that is, it acknowledges the
failure of members with a rank lower than *num_to_ack* in the group that
would be produced by a concurrent call to :ref:`MPIX_Comm_get_failed` on
the same *comm*.

The operation also sets the value of *num_acked* to the current number of
acknowledged process failures in *comm*, that is, a process failure has been
acknowledged on *comm* if and only if the rank of the process is lower than
*num_acked* in the group that would be produced by a subsequent call to
:ref:`MPIX_Comm_get_failed` on the same *comm*.

*num_acked* can be larger than *num_to_ack* when process failures have been
acknowledged in a prior call to :ref:`MPIX_Comm_ack_failed`.

EFFECT OF ACKNOWLEDGING FAILURES
--------------------------------

After an MPI process failure is acknowledged on *comm*, unmatched
MPI_ANY_SOURCE receive operations on the same *comm* that would have raised
an error of class MPIX_ERR_PROC_FAILED_PENDING proceed without further raising
errors due to this acknowledged failure.

Also, :ref:`MPIX_Comm_agree` on the same *comm* will not raise an error of
class MPI_ERR_PROC_FAILED due to this acknowledged failure.

USAGE PATTERNS
--------------

One may query, without side effect, for the number of currently aknowledged
process failures *comm* by supplying 0 in *num_to_ack*.

Conversely, one may unconditionally acknowledge all currently known process
failures in *comm* by supplying the size of the group of *comm* in *num_to_ack*.

Note that the number of acknowledged processes, as returned in *num_acked*,
can be smaller or larger than the value supplied in *num_to_ack*; It is
however never larger than the size of the group returned by a subsequent call
to :ref:`MPIX_Comm_get_failed`.

EFFECT ON COLLECTIVE OPERATIONS
-------------------------------

Calling :ref:`MPIX_Comm_ack_failed` on a communicator with failed MPI
processes has no effect on collective operations (except for :ref:`MPIX_Comm_agree`).
If a collective operation would raise an error due to the communicator
containing a failed process it will continue to raise an error even after
the failure has been acknowledged. In order to use collective operations
between MPI processes of a communicator that contains failed MPI processes,
users should create a new communicator (e.g., by calling :ref:`MPIX_Comm_shrink`).

WHEN COMMUNICATOR IS AN INTER-COMMUNICATOR
------------------------------------------

When the communicator is an inter-communicator, the failures of members
in both the local and the remote groups of *comm* are acknowledged.

ERRORS
------

.. include:: ./ERRORS.rst

.. seealso::
   * :ref:`MPIX_Comm_get_failed`
   * :ref:`MPIX_Comm_agree`