File: node134.html

package info (click to toggle)
scalapack-doc 1.5-11
  • links: PTS
  • area: main
  • in suites: bullseye, buster, stretch
  • size: 10,336 kB
  • ctags: 4,931
  • sloc: makefile: 47; sh: 18
file content (172 lines) | stat: -rw-r--r-- 10,420 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!--Converted with LaTeX2HTML 96.1-h (September 30, 1996) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds -->
<HTML>
<HEAD>
<TITLE>New Sources of Error in Parallel Numerical Computations</TITLE>
<META NAME="description" CONTENT="New Sources of Error in Parallel Numerical Computations">
<META NAME="keywords" CONTENT="slug">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<LINK REL=STYLESHEET HREF="slug.css">
</HEAD>
<BODY LANG="EN" >
 <A NAME="tex2html3890" HREF="node135.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3888" HREF="node132.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3882" HREF="node133.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3892" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3893" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3891" HREF="node135.html">How to Measure Errors</A>
<B>Up:</B> <A NAME="tex2html3889" HREF="node132.html">Accuracy and Stability</A>
<B> Previous:</B> <A NAME="tex2html3883" HREF="node133.html">Sources of Error in </A>
<BR> <P>
<H1><A NAME="SECTION04620000000000000000">New Sources of Error in Parallel Numerical Computations</A></H1>
<P>
<A NAME="sec_Hetero">&#160;</A>
<P>
An important difference between ScaLAPACK and LAPACK is that
a parallel computing environment, possibly consisting of
a heterogeneous collection of processors,
introduces new sources of possible errors not found in the 
serial environment in which LAPACK runs.
These errors could indeed afflict any parallel algorithm that uses
floating-point arithmetic.
For example, consider the following pseudocode, executed in parallel by
several processors:
<P>
<PRE><TT> 
   		 <I>s</I>= global_sum(<I>x</I>)  ...  each processor receives the sum <I>s</I> of global array <I>x</I>
<P>
   		 if <I>s</I> &lt; <I>thresh</I> then
<P>
   		  		 return my part of answer 1
<P>
   		  else
<P>
   		  		 do more computations
<P>
   		  		 return my part of answer 2
<P>
   		 end if
<P>
</TT></PRE>
<P>
It is possible for the value of <I>s</I> to differ from processor to processor;
we call this <EM>incoherence</EM>.<A NAME="4496">&#160;</A>
This can happen if the floating-point arithmetic varies
from processor to processor (we call this <EM>heterogeneity</EM>),<A NAME="4498">&#160;</A> since
processors may not even share the same set of floating-point numbers.
The value of <I>s</I> can also vary if global_sum accumulates the sum
in different orders on different processors,
since floating-point addition is not associative.
In either case, the
test <I>s</I>&lt; <I>thresh</I> may be true on one processor but not another, so
that the program may inconsistently return answer 1 on some processors
and answer 2 on others. If the ``more computations'' include
communication with synchronization, even deadlock could result.<A NAME="4499">&#160;</A>
<A NAME="4500">&#160;</A>
<P>
Deadlock can also result if the floating-point numbers communicated from
one processor to another cause fatal floating-point errors on the receiving
processor. For example, if an IBM RS/6000, running in its default
mode,  sends a message containing a denormalized number 
[<A HREF="node189.html#ieee754">7</A>, <A HREF="node189.html#ieee854">8</A>] <A NAME="4502">&#160;</A>
to a DEC Alpha running in its default mode,
then the DEC Alpha aborts [<A HREF="node189.html#lawn112">19</A>].<A NAME="tex2html1139" HREF="footnode.html#8008"><IMG  ALIGN=BOTTOM ALT="gif" SRC="http://www.netlib.org/utk/icons/foot_motif.gif"></A>
<P>
It is also possible for global_sum to compute the same <I>s</I> on all processors
but compute a different <I>s</I> from run to run of the program,
for example, if global_sum computes the sum in a nondeterministic order
on one processor and broadcasts the result to all processors.
We call this <EM>nonrepeatability</EM>.<A NAME="4506">&#160;</A>
If this happens, debugging the overall code can be more difficult.
<P>
Coherence<A NAME="4507">&#160;</A><A NAME="4508">&#160;</A> and
repeatability<A NAME="4509">&#160;</A><A NAME="4510">&#160;</A>
are independent properties of an algorithm.
It is possible in principle for an algorithm running on a particular 
platform to be incoherent and repeatable, coherent and nonrepeatable, 
or any other combination. On a different platform, the same algorithm
may have different properties.
<P>
Reference [<A HREF="node189.html#lawn112">19</A>]
contains a more extensive discussion of these possible errors.
<P>
One run of a ScaLAPACK routine is designed to be as reliable<A NAME="4512">&#160;</A> as LAPACK, 
so that errors due to incoherence cannot occur as long as ScaLAPACK is 
executed on a <EM>homogeneous network</EM><A NAME="4514">&#160;</A> of
processors.
The following conditions apply:
<UL>
<LI> The processors are completely identical. This also means that
relevant flags, like those controlling the way overflow and underflow 
are handled in IEEE floating-point arithmetic,
<A NAME="4516">&#160;</A>
must be identical.
<LI> The communication library used by the BLACS may only
``copy bits'' and not modify any floating-point numbers (by translation
to a different internal floating-point format, as XDR [<A HREF="node189.html#SunSoft:XDR">111</A>] 
may do).<A NAME="4518">&#160;</A><A NAME="4519">&#160;</A>
<LI> The identical ScaLAPACK object code must be executed by each processor.
</UL>
<P>
The above conditions guarantee that a single ScaLAPACK call is as reliable<A NAME="4521">&#160;</A>
as its LAPACK counterpart.
If, in addition, identical answers from one run to another 
are desired (i.e., <EM>repeatability</EM>),<A NAME="4523">&#160;</A>
this can be guaranteed at runtime by calling 
BLACS_SET<A NAME="4524">&#160;</A> to enforce repeatability<A NAME="4525">&#160;</A> of the BLACS, and the ScaLAPACK
routines that use them, by using an appropriate topology 
(see the BLACS users guide&nbsp;[<A HREF="node189.html#lawn94">54</A>] for details).
<P>
Maintaining coherence<A NAME="4527">&#160;</A> on a heterogeneous network<A NAME="4528">&#160;</A> is harder, and not always
possible.  If floating-point formats differ 
(say, on a Cray C90 and IBM RS/6000, which uses IEEE arithmetic), 
there is no cost-effective way to guarantee coherence.
If floating-point formats are the same, however, operations such as
global sums can accumulate the result on one processor and broadcast it
to guarantee coherence (except for the problem of DEC Alphas and denormalized
numbers mentioned above).
The BLACS do this, except when using the 
``bidirectional exchange'' topology. One can avoid using 
``bidirectional exchange'' and so guarantee coherence whenever possible, 
by calling BLACS_SET<A NAME="4529">&#160;</A> to enforce coherence<A NAME="4530">&#160;</A>
(see the BLACS users guide&nbsp;[<A HREF="node189.html#lawn94">54</A>] for details).
<P>
Still other ScaLAPACK routines are guaranteed to work
only on homogeneous networks (PxGESVD and PxSYEV). These routines do 
large numbers of redundant calculations on all processors and depend on
the results of these calculations being the same. There are too many
of these calculations to cost-effectively compute them all on one processor
and broadcast the results.
<P>
The user may wonder why ScaLAPACK and the BLACS are not designed to
guarantee coherence and repeatability in the most general possible situations,
so that calling BLACS_SET would not be necessary.  
The reason is that the possible bugs described above are quite rare, 
and so ScaLAPACK and the BLACS were designed to maximize performance instead.
Provided the mere sending of floating-point numbers does not cause a
fatal error, these bugs cannot occur at all in most ScaLAPACK routines, 
because branches depending on a supposedly identical floating-point value 
like <I>s</I> do not occur. 
For most other ScaLAPACK routines where such branches do occur,
we have not seen these bugs despite extensive testing, including attempts 
to cause them to occur. 
Complete understanding and cost-effective 
elimination of such possible bugs are future work.
<P>
In the meantime, to get repeatability when running on a homogeneous network,
we recommend calling BLACS_SET<A NAME="4532">&#160;</A> as described above when using the following
ScaLAPACK drivers: PxGESVX, PxPOSVX, PxSYEV, PxSYEVX, PxGESVD, and PxSYGVX.
<A NAME="4533">&#160;</A><A NAME="4534">&#160;</A><A NAME="4535">&#160;</A><A NAME="4536">&#160;</A>
<A NAME="4537">&#160;</A><A NAME="4538">&#160;</A><A NAME="4539">&#160;</A><A NAME="4540">&#160;</A>
<A NAME="4541">&#160;</A><A NAME="4542">&#160;</A>
<A NAME="4543">&#160;</A><A NAME="4544">&#160;</A><A NAME="4545">&#160;</A><A NAME="4546">&#160;</A>
<A NAME="4547">&#160;</A><A NAME="4548">&#160;</A><A NAME="4549">&#160;</A><A NAME="4550">&#160;</A>
<A NAME="4551">&#160;</A><A NAME="4552">&#160;</A>
<P>
<HR><A NAME="tex2html3890" HREF="node135.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3888" HREF="node132.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3882" HREF="node133.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3892" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3893" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3891" HREF="node135.html">How to Measure Errors</A>
<B>Up:</B> <A NAME="tex2html3889" HREF="node132.html">Accuracy and Stability</A>
<B> Previous:</B> <A NAME="tex2html3883" HREF="node133.html">Sources of Error in </A>
<P><ADDRESS>
<I>Susan Blackford <BR>
Tue May 13 09:21:01 EDT 1997</I>
</ADDRESS>
</BODY>
</HTML>