1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472
|
.TH XMPI 1 "November, 1999" "2.2" "LAM X11 TOOLS"
.SH NAME
XMPI \- X Window MPI user interface
.SH SYNTAX
xmpi [-h] [<boot_schema>]
.SH DESCRIPTION
.I XMPI
is a graphical user interface for running MPI programs, monitoring MPI
processes and messages, and viewing execution trace files.
It exploits the debugging capabilities of LAM, a parallel computing
environment for UNIX clusters.
.I XMPI
is constructed from the Motif widget set.
.PP
.I XMPI
does not provide an interface for starting a LAM session.
This must be accomplished prior to running
.IR XMPI ,
which is itself a LAM program.
The boot schema from which LAM was started can (should) be provided to
.I XMPI
so that it may be presented as an inventory of nodes on which programs
may be run.
If
.I XMPI
is to be used only to view trace files then starting LAM is not required.
.PP
This description assumes a basic knowledge of MPI.
.SH TYPICAL USAGE
.I XMPI
provides a graphical display of the state of the processes
within an MPI application.
The state information is obtained from one of two sources,
a running application started by XMPI or a file containing trace data from a
traced MPI application.
When
.I XMPI
is started, its top-level overview window is blank.
Once an application is started or a trace file is loaded the
overview window fills with a tiled group of hexagons, each representing
the state of one MPI process and labeled by the process rank within
MPI_COMM_WORLD.
A traffic light symbol indicates whether the process is running or blocked.
No traffic light is shown for processes which have either finalized or not yet
initialized the MPI library.
.PP
When monitoring a running application the camera "Snap" button or
"Snapshot" item in the "Application" menu updates the state information
on all processes at any time.
When viewing trace data the state information is updated according to
the currently selected time point (see "XMPI TRACE FILES").
.PP
A mouse click inside a hexagon pops up an additional window containing
more detailed information about the process.
If the process is blocked, the function name, peer process rank, communicator,
message tag and element count are displayed.
If unreceived messages are available, their quantity, source process rank,
communicator, message tag and element count are displayed.
By leaving a few process windows on the screen, a user can focus debugging
on a small and manageable collection of misbehaving processes.
.PP
The "Clean"
button or "Clean" item in the "Application" menu terminates an
application and the development cycle can be repeated.
The previous application can be rerun with the "Rerun" button or
"Rerun" item in the "Application" menu.
.SH RUNNING AN APPLICATION
An application schema specifies an MPI application by listing each
process's program name, program location, target processor(s) and
optional command line arguments.
.PP
The "Browse&Run" item in the "Application" menu pops up a simple file
browser for choosing and running a pre-written application schema.
Alternatively an application schema can be configured with the
.I XMPI
application builder dialog, invoked by the "Build&Run" item in the
"Application" menu.
.PP
The builder dialog has an area to specify each process and
an arrow button to add it to the application schema, which is shown
below the arrow button in a scrolled list.
The lines in the list show the syntax that would be used in creating
the same application with a text editor.
Indeed, the "Save" button saves the application schema in a file
for later use and/or editing.
.PP
A specified process does not become part of the application until
the arrow (commit) button is pressed.
Once it appears in the application scrolled list, a process can be
deleted by selecting it and pressing the <Delete> key.
.PP
Pressing the "Run" button with anything in the application list causes
that application to be run.
The overview window is then initialized with the status of the
application.
.SS Program Specification
A file browser in the middle of the builder dialog aids in selecting
a program file.
The browser only navigates the file space of the node running
.IR XMPI .
If a program is located on another node outside the file space
(outside NFS, etc.) its pathname may need to be typed into the process
specification area.
Selecting the "Use Full Pathname" toggle button will cause programs
to be placed into the application schema as full pathnames.
.PP
.I XMPI
limits the choice of a program source node to either the node running
.I XMPI
or the process target node.
The latter case is the default and is the most efficient because
LAM does not need to transfer the program from source to target node.
The "Transfer Program" toggle button selects the source node policy.
.SS Multiple Program Copies
The number of copies of a program to be run can be set in the process
specification area. Clicking on the increment or decrement arrow will
increment or decrement the count by one. Clicking with the shift key
down will increment or decrement by ten.
.SS Command-line Arguments
Command-line arguments must be typed into the process specification area.
.SS Node Specification
A boot schema specifies the computers participating as nodes
in a LAM multicomputer.
If
.I XMPI
is given a boot schema filename, its contents will appear in a scrolled
list on the right side of the builder dialog.
.I XMPI
will search for the given schema in the local directory.
The boot schema filename is displayed above the list of its nodes.
Multiple target nodes can be selected from the scrolled list with the
corresponding node mnemonic appearing in the process specification area.
Selecting multiple target nodes specifies multiple processes with the program
name, arguments and source node policy held constant.
.PP
If no boot schema was specified only the special node selectors "LOCAL"
(meaning the node on which
.I XMPI
is running) and "ALL NODES" are provided.
.PP
Target node descriptions may also be typed directly into the process
specification area.
The local node is specified as \fIh\fR.
The origin node from which the machine was booted, if not local,
can be specified as \fIo\fR.
All usable nodes are specified as \fIN\fR.
Nodes are generically identified as
.I n<list>,
where <list> can be a single node identifier or a list of node identifiers.
Identifiers can be written in decimal or hexadecimal notation.
Examples are
.I n1
or
.IR n0-7,0x10 .
.SS Run-time Options
Applications can be run with various run-time options to specify
the behaviour of the MPI library.
These can be configured from a separate dialog which is activated from
the "Runtime" item in the "Options" menu.
Options remain in effect until changed.
.PP
.IP \(bu
tracing mode (default enabled)
.IP \(bu
fast client-to-client communication (default disabled)
.IP \(bu
GER protocol and error detection (default enabled)
.IP \(bu
homogeneous LAM node optimization (default disabled)
.SH FOCUSING ON A PROCESS
More information on a process's state can be obtained by clicking the
left mouse button within the process hexagon.
This will pop up a focus window.
The upper area of the focus window is the process area and
displays the current state of the process.
The lower area is the message area and displays information on the
process's message queue.
.PP
The focus window banner contains a tack button which can be clicked to
dismiss the window and a label containing the process's identity along with
the program name.
In
.I XMPI
processes are identified first by their rank in MPI_COMM_WORLD and if
the process is communicating, with a slash followed by the process's
rank within the current communicator.
The focus window can also be dismissed by clicking once
again in the process hexagon.
.PP
The process area describes the current state of the process together
with the name of and (where appropriate) arguments to the MPI function
currently being executed.
The layout is fairly self-explanatory and we describe only the less
obvious features.
.SS Communicator Identification
The "comm" area shows the communicator being used in the current
MPI function.
Communicators are opaque
objects which MPI does not identify in any meaningful, printable way.
LAM's MPI implementation adds a simple numerical identifier to
communicators, which is displayed in
.I XMPI
as <x>
where \fIx\fR is the identifier.
This identifier can be matched to communicator variables in an MPI program with
the LAM function, MPIL_Comm_id(2).
.SS Group Membership
The button to the right of the "comm" area will highlight in the
overview window the hexagons of the processes in the communicator.
For an intracommunicator, the hexagons will be highlighted in the
color specified by the "lcomCol" resource.
For an intercommunicator, processes in the local group will be highlighted in
the color specified by the "lcomCol" resource and those in the remote
group in the color specified by the "rcomCol" resource.
For highlighted processes the process identification at the bottom of the
hexagon is changed to be the rank in MPI_COMM_WORLD followed by a slash and the
rank in the communicator being highlighted.
.SS Datatype
The datatype button to the right of the "cnt" area will display
in the datatype window (see "DATATYPE WINDOW") the type map of the datatype
argument to the current MPI function.
.PP
The message area describes the current state of the queue of messages destined
to the process and not yet received.
Once again the layout is fairly self-explanatory
and we describe only the less obvious features.
.SS Message Aggregates
Identical undelivered messages are aggregated. The "copy"
area shows the number of messages
within the visible aggregate, followed by the total number
of messages in the queue.
The button to the right of the "copy" area cycles through the message
aggregates.
.SS Source Rank
The "src" area shows the rank of the source process within
MPI_COMM_WORLD followed by the rank of the source process in the communicator in
which the message was sent.
.SS Datatype
The datatype button to the right of the "cnt" area
will display in the datatype window the type map of the message's datatype.
.SS Group Membership
The button to the right of the "comm" area
will highlight the message communicator in the manner previously described.
.SH XMPI TRACE FILES
.I XMPI
can be used to view existing trace files and can be used to create
trace files for applications run under
.IR XMPI .
.PP
To load and view an existing trace file select the "View" item in the
"Trace" menu.
.PP
If an application is run under
.I XMPI
with tracing enabled (the default), LAM
will trace the application.
Before the trace data can be viewed in
.I XMPI
it must be dumped to a file.
This is done by selecting the "Dump" item from the "Trace" menu.
You will be prompted for a file name.
By convention
.I XMPI
trace files have a ".lamtr" suffix.
The trace file can be viewed by loading it as described above.
As a shortcut select the "Express" item in the "Trace" menu, or equivalently
click the "Trace" button in the overview window.
This dumps the trace data to a temporary file and then
immediately loads the file for viewing.
If you decide that you want to save trace data for later viewing then you
must dump it using the "Dump" item from the "Trace" menu.
Dumping trace data to file does not purge
any trace data and a subsequent dump will contain all the trace data
from the start of the application up until the time of dumping.
Terminating an application via the "Clean" button or menu item purges
all trace data.
.PP
While viewing a trace an application previously launched by XMPI
continues to run in the background. Upon the closing of the trace
window XMPI will return to snapshot mode if there is a running
application.
.PP
When loading trace files containing multiple segments (see
MPIL_Trace_on(2) and MPIL_Trace_off(2)) you will be prompted for the
number of the segment you wish to view.
If you wish later to view a different segment, simply reload the trace
file and specify the new segment number when prompted.
Reloading is done via the "View" or "Express" items in the "Trace" menu.
.SS Communication Timeline Window
Across the top of the timeline window is a control and information area.
The trace data is displayed below this on timelines, one per process in
the traced application.
The state of the application at a particular
time is represented by the corresponding traffic light color.
Green represents running, red represents blocked waiting on communication and
yellow represents time spent inside an MPI function not blocked on
communication (we call this system overhead time as it typically
represents time doing data conversion, message packing, etc).
.PP
The dial can be used to select a time point at
which the process states are to be displayed.
In the overview window the
process states at the dial time are displayed in hexagon form.
As with snapshot mode more detailed information on a process can be
obtained by bringing up its focus window.
The dial may be moved by clicking with the left button in the trace view
area or via the VCR controls.
Below the VCR controls are displayed from left to right, the time of
the left edge of the displayed timeline, the current dial time and the time of
the right edge of the displayed timeline.
.PP
To the right of the VCR controls is displayed the current magnification.
When a trace file is loaded
.I XMPI
chooses an initial scaling factor and sets this to be the 1x1 magnification.
You can increase and decrease the magnification using the zoom and
un-zoom buttons.
.PP
A segment of the currently displayed timeline can be selected by
dragging the right mouse button in the timeline display area. Upon
release of the right button the display is zoomed to show the selected
segment.
To cancel a drag in progress, drag the cursor up or down out of the
timeline display area.
.SS How Communication Is Represented
.IP \fICollective\fR 4
A collective communication is represented for each process by contiguous line
segments showing the time spent in system overhead and the time spent
blocked waiting for communication.
No lines are drawn connecting the
processes participating in the collective communication.
.IP \fIBlocking point to point\fR 4
For both the send and receive process contiguous line segments are drawn
showing the time spent in system overhead and the time spent blocked
waiting for the communication to complete.
A line is drawn connecting the send to the receive.
It originates at the beginning of the send
segments and is drawn to the end of the matching receive segments.
.IP \fINon-blocking point to point\fR 4
At the time a non-blocking send or receive is initiated a system
overhead segment is drawn.
When the communication is completed via a wait or test, segments showing
system overhead and blocking time are drawn.
Lines are drawn between matching sends and receives, except in this case
the line is drawn from the segment where the send was initiated to where the
corresponding receive completed.
.IP \fIWaits and tests\fR 4
If a non-blocking communication is completed inside a
wait/test function
.I XMPI
will show the function name in the focus window as
the wait/test function followed in parentheses by the send/receive
function being completed.
For example, if an MPI_Issend() is completed
inside an MPI_Wait(), the function will read
\fIMPI_Wait (MPI_Issend)\fR.
.IP \fIMissing traces\fR 4
Owing to the use of trace segments or the dropping of overflow traces
(see lamtrace(1)) there may be send or receive traces which have no match in
the trace data.
In these cases a short stub line is drawn out from a send or in to a receive.
.SS Kiviat Window
When viewing a trace file, the "Kiviat" button or
"Kiviat" item from the "Trace" menu brings up the Kiviat window.
This window displays, in a segmented pie-chart format, the cumulative time
up to the current dial time, spent by each process in the running, overhead
and blocked states.
.SH MESSAGE SOURCE MATRIX
The message source window displays a square matrix of process message
queue lengths.
For each process it shows the number of queued messages from each other
process in the application.
It can be brought up while monitoring a running application or while viewing a
trace file, by selecting the "Matrix" button or "Matrix" item in the
"Trace" menu.
.SH DATATYPE WINDOW
The datatype window displays a textual representation of the type map of an
MPI datatype.
This window is associated at any instant with a particular
process and mode.
The associated process is shown in the window's banner and the mode is
indicated by a traffic light or message queue icon shown in the left
part of the window.
When in process mode the datatype being shown, if any,
is the datatype argument of the MPI function the process is executing.
When in message mode the datatype is that
of the current message aggregate selected in the process focus window.
Switching between processes and modes is effected via the
datatype buttons in the process focus windows.
.PP
The type map might not fit completely into the default size window.
Simply resize the window to see the whole map.
.SH SWITCHING INFORMATION SOURCES
.I XMPI
will gather and display information from either the currently executing
application or a trace file.
When an application is launched from
.IR XMPI ,
the information source is the executing application and the "Snap" button
is active.
Though the application may be producing trace data, the "Snap" button
does not use it, but instead acquires information from debugging hooks
in the MPI implementation.
At any moment, an existing trace file may be loaded into
.I XMPI
or the currently accumulating trace data may be fetched from the MPI
implementation, stored in a file, and loaded.
This action changes the information source to the loaded trace file.
Information display is now controlled from the dial in the timeline window
and not from the "Snap" button, which is now inactive.
Though the application may still be running, the timeline dial does not
use the runtime debugging hooks, but instead acquires information
from the loaded trace file.
Upon the closing of the trace window XMPI will return to snapshot mode
if there is a running application.
.SH RESOURCES
.I XMPI
defines the following application resources.
.PP
.TP 20
XMPI.helpCmd
command that is run to provide help.
The default is typically a command which fires up a Web browser to
view a help page.
You should change this to invoke your favourite browser.
.TP
XMPI.rankFont
process rank font in hexagon
.TP
XMPI.msgFont
total message count font in hexagon (may need to be adjusted to fit
inside message icon)
.TP
XMPI.lcomCol
color used to highlight the processes in an intracommunicator
or in the the local group of an intercommunicator
.TP
XMPI.rcomCol
color used to highlight the processes in the remote group of an
intercommunicator
.TP
XMPI.bandCol
color used for the zoom selection rubber band
.TP
XMPI.bandDash
if True use a dashed line rubber band to show the zoom selection
otherwise use a solid line
.TP
XMPI.bandWidth
width of the zoom selection rubber band
.PP
.I XMPI
gets important default resources from the application defaults file, XMPI.
If this file is not installed in the X11 default directory, its directory
can be added to the XAPPLRESDIR environment variable.
.SH LIMITATIONS
An application must be started by
.I XMPI
to be monitored by it.
.PP
When using the fast client-to-client communication mode process states
in snapshot mode are always shown as running and no useful information
is shown in the process focus windows.
.PP
.I XMPI
uses lamclean(1).
Errors reported by this tool will still print to standard output.
A shorter message will appear in an
.I XMPI
error dialog.
.SH SEE ALSO
mpimsg(1), mpirun(1), mpitask(1), lamtrace(1)
|