File: ecp_hardware.rst

package info (click to toggle)
adios2 2.10.2%2Bdfsg1-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, trixie
  • size: 33,764 kB
  • sloc: cpp: 175,964; ansic: 160,510; f90: 14,630; yacc: 12,668; python: 7,275; perl: 7,126; sh: 2,825; lisp: 1,106; xml: 1,049; makefile: 579; lex: 557
file content (61 lines) | stat: -rw-r--r-- 2,311 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
######################
ADIOS2 in ECP hardware
######################

ADIOS2 is widely used in ECP (Exascale Computing Project) HPC (high performance
computing) systems, some particular ADIOS2 features needs from specifics
workarounds to run successfully.

OLCF CRUSHER
============

SST MPI Data Transport
----------------------

MPI Data Transport relies on client-server features of MPI which are currently
supported in Cray-MPI implementations with some caveats. Here are some of the
observed issues and what its workaround (if any) are:

**MPI_Finalize** will block the system process in the "Writer/Producer" ADIOS2
instance. The reason is that the Producer ADIOS instance internally calls
`MPI_Open_port` which somehow even after calling `MPI_Close_port` `MPI_Finalize`
still consider its port to be in used, hence blocking the process. The
workaround is to use a `MPI_Barrier(MPI_COMM_WORLD)` instead of `MPI_Finalize()`
call.

**srun does not understand mpmd instructions** Simply disable them with the flag
`-DADIOS2_RUN_MPI_MPMD_TESTS=OFF`

**Tests timeout**  Since we launch every tests with srun the scheduling times
can exceed the test default timeout. Use a large timeout (5mins) for running
your tests.

Examples of launching ADIOS2 SST unit tests using MPI DP:

.. code-block:: bash

  # We omit some of the srun (SLURM) arguments which are specific of the project
  # you are working on. Note that you could avoid calling srun directly by
  # setting the CMAKE variable `MPIEXEC_EXECUTABLE`.

  # Launch simple writer test instance
  srun {PROJFLAGS } -N 1 /gpfs/alpine/proj-shared/csc331/vbolea/ADIOS2-build/bin/TestCommonWrite SST mpi_dp_test CPCommPattern=Min,MarshalMethod=BP5

  # On another terminal launch multiple instances of the Reader test
  srun {PROJFLAGS} -N 2 /gpfs/alpine/proj-shared/csc331/vbolea/ADIOS2-build/bin/TestCommonRead SST mpi_dp_test

Alternatively, you can configure your CMake build to use srun directly:

.. code-block:: bash

  cmake . -DMPIEXEC_EXECUTABLE:FILEPATH="/usr/bin/srun" \
       -DMPIEXEC_EXTRA_FLAGS:STRING="-A{YourProject} -pbatch -t10" \
       -DMPIEXEC_NUMPROC_FLAG:STRING="-N" \
       -DMPIEXEC_MAX_NUMPROCS:STRING="-8" \
       -DADIOS2_RUN_MPI_MPMD_TESTS=OFF

  cmake --build .
  ctest

  # monitor your jobs
  watch -n1 squeue -l -u $USER