File: README.md

package info (click to toggle)
openmpi 5.0.7-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 202,312 kB
  • sloc: ansic: 612,441; makefile: 42,495; sh: 11,230; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,154; python: 1,856; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (209 lines) | stat: -rw-r--r-- 6,388 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
# Open MPI common monitoring module

Copyright (c) 2013-2015 The University of Tennessee and The University
                         of Tennessee Research Foundation.  All rights
                         reserved.
 Copyright (c) 2013-2015 Inria.  All rights reserved.

Low level communication monitoring interface in Open MPI

## Introduction

This interface traces and monitors all messages sent by MPI before
they go to the communication channels. At that levels all
communication are point-to-point communications: collectives are
already decomposed in send and receive calls.

The monitoring is stored internally by each process and output on
stderr at the end of the application (during `MPI_Finalize()`).


## Enabling the monitoring

To enable the monitoring add `--mca pml_monitoring_enable x` to the
`mpirun` command line:

* If x = 1 it monitors internal and external tags indifferently and aggregate everything.
* If x = 2 it monitors internal tags and external tags separately.
* If x = 0 the monitoring is disabled.
* Other value of x are not supported.

Internal tags are tags < 0. They are used to tag send and receive
coming from collective operations or from protocol communications

External tags are tags >=0. They are used by the application in
point-to-point communication.

Therefore, distinguishing external and internal tags help to
distinguish between point-to-point and other communication (mainly
collectives).

## Output format

The output of the monitoring looks like (with `--mca
pml_monitoring_enable 2`):

```
I	0	1	108 bytes	27 msgs sent
E	0	1	1012 bytes	30 msgs sent
E	0	2	23052 bytes	61 msgs sent
I	1	2	104 bytes	26 msgs sent
I	1	3	208 bytes	52 msgs sent
E	1	0	860 bytes	24 msgs sent
E	1	3	2552 bytes	56 msgs sent
I	2	3	104 bytes	26 msgs sent
E	2	0	22804 bytes	49 msgs sent
E	2	3	860 bytes	24 msgs sent
I	3	0	104 bytes	26 msgs sent
I	3	1	204 bytes	51 msgs sent
E	3	1	2304 bytes	44 msgs sent
E	3	2	860 bytes	24 msgs sent
```

Where:

1. the first column distinguishes internal (I)  and external (E) tags.
1. the second column is the sender rank
1. the third column is the receiver rank
1. the fourth column is the number of bytes sent
1. the last column is the number of messages.

In this example process 0 as sent 27 messages to process 1 using
point-to-point call for 108 bytes and 30 messages with collectives and
protocol related communication for 1012 bytes to process 1.

If the monitoring was called with `--mca pml_monitoring_enable 1`,
everything is aggregated under the internal tags. With the e above
example, you have:

```
I	0	1	1120 bytes	57 msgs sent
I	0	2	23052 bytes	61 msgs sent
I	1	0	860 bytes	24 msgs sent
I	1	2	104 bytes	26 msgs sent
I	1	3	2760 bytes	108 msgs sent
I	2	0	22804 bytes	49 msgs sent
I	2	3	964 bytes	50 msgs sent
I	3	0	104 bytes	26 msgs sent
I	3	1	2508 bytes	95 msgs sent
I	3	2	860 bytes	24 msgs sent
```

## Monitoring phases

If one wants to monitor phases of the application, it is possible to
flush the monitoring at the application level. In this case all the
monitoring since the last flush is stored by every process in a file.

An example of how to flush such monitoring is given in
`test/monitoring/monitoring_test.c`.

Moreover, all the different flushed phased are aggregated at runtime
and output at the end of the application as described above.

## Example

A working example is given in `test/monitoring/monitoring_test.c` It
features, `MPI_COMM_WORLD` monitoring , sub-communicator monitoring,
collective and point-to-point communication monitoring and phases
monitoring

To compile:

```
shell$ make monitoring_test
```

## Helper scripts

Two perl scripts are provided in test/monitoring:

1. `aggregate_profile.pl` is for aggregating monitoring phases of
   different processes This script aggregates the profiles generated by
   the `flush_monitoring` function.

   The files need to be in in given format: `name_<phase_id>_<process_id>`
   They are then aggregated by phases.
   If one needs the profile of all the phases he can concatenate the different files,
   or use the output of the monitoring system done at `MPI_Finalize`
   in the example it should be call as:
   ```
   ./aggregate_profile.pl prof/phase to generate
   prof/phase_1.prof
   prof/phase_2.prof
   ```

1. `profile2mat.pl` is for transforming a the monitoring output into a
   communication matrix.  Take a profile file and aggregates all the
   recorded communicator into matrices.  It generated a matrices for
   the number of messages, (msg), for the total bytes transmitted
   (size) and the average number of bytes per messages (avg)

   The output matrix is symmetric.

For instance, the provided examples store phases output in `./prof`:

```
shell$ mpirun -n 4 --mca pml_monitoring_enable 2 ./monitoring_test
```

Should provide the following output:

```
Proc 3 flushing monitoring to: ./prof/phase_1_3.prof
Proc 0 flushing monitoring to: ./prof/phase_1_0.prof
Proc 2 flushing monitoring to: ./prof/phase_1_2.prof
Proc 1 flushing monitoring to: ./prof/phase_1_1.prof
Proc 1 flushing monitoring to: ./prof/phase_2_1.prof
Proc 3 flushing monitoring to: ./prof/phase_2_3.prof
Proc 0 flushing monitoring to: ./prof/phase_2_0.prof
Proc 2 flushing monitoring to: ./prof/phase_2_2.prof
I	2	3	104 bytes	26 msgs sent
E	2	0	22804 bytes	49 msgs sent
E	2	3	860 bytes	24 msgs sent
I	3	0	104 bytes	26 msgs sent
I	3	1	204 bytes	51 msgs sent
E	3	1	2304 bytes	44 msgs sent
E	3	2	860 bytes	24 msgs sent
I	0	1	108 bytes	27 msgs sent
E	0	1	1012 bytes	30 msgs sent
E	0	2	23052 bytes	61 msgs sent
I	1	2	104 bytes	26 msgs sent
I	1	3	208 bytes	52 msgs sent
E	1	0	860 bytes	24 msgs sent
E	1	3	2552 bytes	56 msgs sent
```

You can then parse the phases with:

```
shell$ /aggregate_profile.pl prof/phase
Building prof/phase_1.prof
Building prof/phase_2.prof
```

And you can build the different communication matrices of phase 1
with:

```
shell$ ./profile2mat.pl prof/phase_1.prof
prof/phase_1.prof -> all
prof/phase_1_size_all.mat
prof/phase_1_msg_all.mat
prof/phase_1_avg_all.mat

prof/phase_1.prof -> external
prof/phase_1_size_external.mat
prof/phase_1_msg_external.mat
prof/phase_1_avg_external.mat

prof/phase_1.prof -> internal
prof/phase_1_size_internal.mat
prof/phase_1_msg_internal.mat
prof/phase_1_avg_internal.mat
```

## Authors

Designed by George Bosilca <bosilca@icl.utk.edu> and
Emmanuel Jeannot <emmanuel.jeannot@inria.fr>