1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
|
.. _mpix_comm_ack_failed:
MPIX_Comm_ack_failed
====================
.. include_body
:ref:`MPIX_Comm_get_failed` - acknowledge failed processes in a communicator.
This is part of the User Level Fault Mitigation :ref:`ULFM extension <ulfm-label>`.
SYNTAX
------
C Syntax
^^^^^^^^
.. code-block:: c
#include <mpi.h>
#include <mpi-ext.h>
int MPIX_Comm_ack_failed(MPI_Comm comm, int num_to_ack, int *num_acked)
Fortran Syntax
^^^^^^^^^^^^^^
.. code-block:: fortran
USE MPI
USE MPI_EXT
! or the older form: INCLUDE 'mpif.h'
MPIX_COMM_ACK_FAILED(COMM, NUM_TO_ACK, NUM_ACKED, IERROR)
INTEGER COMM, NUM_TO_ACK, NUM_ACKED, IERROR
Fortran 2008 Syntax
^^^^^^^^^^^^^^^^^^^
.. code-block:: fortran
USE mpi_f08
USE mpi_ext_f08
MPIX_Comm_ack_failed(comm, num_to_ack, num_acked, ierror)
TYPE(MPI_Comm), INTENT(IN) :: comm
INTEGER, INTENT(IN) :: num_to_ack
INTEGER, INTENT(OUT) :: num_acked
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
INPUT PARAMETERS
----------------
* ``comm``: Communicator (handle).
* ``num_to_ack``: maximum number of process failures to acknowledge in *comm* (integer)
OUTPUT PARAMETERS
-----------------
* ``num_acked``: number of acknowledged failures in *comm* (integer).
* ``ierror``: Fortran only: Error status (integer).
DESCRIPTION
-----------
his local operation gives the users a way to **acknowledge**
locally notified failures on *comm*. The operation acknowledges the first
*num_to_ack* process failures on *comm*, that is, it acknowledges the
failure of members with a rank lower than *num_to_ack* in the group that
would be produced by a concurrent call to :ref:`MPIX_Comm_get_failed` on
the same *comm*.
The operation also sets the value of *num_acked* to the current number of
acknowledged process failures in *comm*, that is, a process failure has been
acknowledged on *comm* if and only if the rank of the process is lower than
*num_acked* in the group that would be produced by a subsequent call to
:ref:`MPIX_Comm_get_failed` on the same *comm*.
*num_acked* can be larger than *num_to_ack* when process failures have been
acknowledged in a prior call to :ref:`MPIX_Comm_ack_failed`.
EFFECT OF ACKNOWLEDGING FAILURES
--------------------------------
After an MPI process failure is acknowledged on *comm*, unmatched
MPI_ANY_SOURCE receive operations on the same *comm* that would have raised
an error of class MPIX_ERR_PROC_FAILED_PENDING proceed without further raising
errors due to this acknowledged failure.
Also, :ref:`MPIX_Comm_agree` on the same *comm* will not raise an error of
class MPI_ERR_PROC_FAILED due to this acknowledged failure.
USAGE PATTERNS
--------------
One may query, without side effect, for the number of currently aknowledged
process failures *comm* by supplying 0 in *num_to_ack*.
Conversely, one may unconditionally acknowledge all currently known process
failures in *comm* by supplying the size of the group of *comm* in *num_to_ack*.
Note that the number of acknowledged processes, as returned in *num_acked*,
can be smaller or larger than the value supplied in *num_to_ack*; It is
however never larger than the size of the group returned by a subsequent call
to :ref:`MPIX_Comm_get_failed`.
EFFECT ON COLLECTIVE OPERATIONS
-------------------------------
Calling :ref:`MPIX_Comm_ack_failed` on a communicator with failed MPI
processes has no effect on collective operations (except for :ref:`MPIX_Comm_agree`).
If a collective operation would raise an error due to the communicator
containing a failed process it will continue to raise an error even after
the failure has been acknowledged. In order to use collective operations
between MPI processes of a communicator that contains failed MPI processes,
users should create a new communicator (e.g., by calling :ref:`MPIX_Comm_shrink`).
WHEN COMMUNICATOR IS AN INTER-COMMUNICATOR
------------------------------------------
When the communicator is an inter-communicator, the failures of members
in both the local and the remote groups of *comm* are acknowledged.
ERRORS
------
.. include:: ./ERRORS.rst
.. seealso::
* :ref:`MPIX_Comm_get_failed`
* :ref:`MPIX_Comm_agree`
|