File: node140.html

package info (click to toggle)
lapack 3.0.20000531a-28
  • links: PTS
  • area: main
  • in suites: sarge
  • size: 61,920 kB
  • ctags: 46,200
  • sloc: fortran: 584,835; perl: 8,226; makefile: 2,331; awk: 71; sh: 45
file content (224 lines) | stat: -rw-r--r-- 8,516 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!--Converted with LaTeX2HTML 98.2 beta6 (August 14th, 1998)
original version by:  Nikos Drakos, CBLU, University of Leeds
* revised and updated by:  Marcus Hennecke, Ross Moore, Herb Swan
* with significant contributions from:
  Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
<HTML>
<HEAD>
<TITLE>Poor Performance</TITLE>
<META NAME="description" CONTENT="Poor Performance">
<META NAME="keywords" CONTENT="lug_l2h">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<LINK REL="STYLESHEET" HREF="lug_l2h.css">
<LINK REL="previous" HREF="node139.html">
<LINK REL="up" HREF="node132.html">
<LINK REL="next" HREF="node141.html">
</HEAD>
<BODY >
<!--Navigation Panel-->
<A NAME="tex2html6187"
 HREF="node141.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="next_motif.png"></A> 
<A NAME="tex2html6181"
 HREF="node132.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="up_motif.png"></A> 
<A NAME="tex2html6177"
 HREF="node139.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="previous_motif.png"></A> 
<A NAME="tex2html6183"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="contents_motif.png"></A> 
<A NAME="tex2html6185"
 HREF="node152.html">
<IMG WIDTH="43" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="index"
 SRC="index_motif.png"></A> 
<BR>
<B> Next:</B> <A NAME="tex2html6188"
 HREF="node141.html">Index of Driver and</A>
<B> Up:</B> <A NAME="tex2html6182"
 HREF="node132.html">Troubleshooting</A>
<B> Previous:</B> <A NAME="tex2html6178"
 HREF="node139.html">Wrong Results</A>
 &nbsp <B>  <A NAME="tex2html6184"
 HREF="node1.html">Contents</A></B> 
 &nbsp <B>  <A NAME="tex2html6186"
 HREF="node152.html">Index</A></B> 
<BR>
<BR>
<!--End of Navigation Panel-->

<H1><A NAME="SECTION03760000000000000000">
Poor Performance</A>
</H1>

<P>
LAPACK relies on an efficient implementation of the BLAS<A NAME="21134"></A>.
We have tried to make
the performance of LAPACK ``transportable'' by performing most of
the computation within the Level 1, 2, and 3 BLAS, and by isolating
all of the machine-dependent tuning parameters
in a single integer function ILAENV<A NAME="21135"></A>.

<P>
To avoid poor performance<A NAME="21136"></A> from LAPACK 
routines, note the
following recommendations<A NAME="21137"></A>:

<P>
<DL>
<DT><STRONG>BLAS:</STRONG>
<DD>One should use machine-specific optimized BLAS if they are available.
Many manufacturers and research institutions have developed, or are
developing, efficient versions of the BLAS for particular machines.
The BLAS enable LAPACK routines to achieve high performance
with transportable software.  Users are urged to determine whether such an
implementation of the BLAS exists for their platform. When
such an optimized implementation of the BLAS is available, it
should be used to ensure
optimal performance.
If such a
machine-specific implementation of the BLAS does not exist for a particular
platform, one should consider installing a publicly available
set of BLAS that requires only an efficient implementation of the
matrix-matrix multiply BLAS routine xGEMM.  Examples of such
implementations are [<A
 HREF="node151.html#dayde94a">21</A>,<A
 HREF="node151.html#kagstrom95b">72</A>].  A machine-specific and
efficient implementation of the routine GEMM can be automatically
generated by publicly available software such as [<A
 HREF="node151.html#atlas_sc98">102</A>] and
[<A
 HREF="node151.html#lawn111">15</A>].
Although a reference implementation of the Fortran77 BLAS is available
from the <EM>blas</EM> directory on <EM>netlib</EM>, these routines are not
expected to
perform as well as a specially tuned implementation on most high-performance
computers - on some machines it may give much worse performance
- but it allows users to run LAPACK software on machines
that do not offer any other implementation of the BLAS.

<P>
<DT><STRONG>ILAENV:</STRONG>
<DD>For best performance, the LAPACK routine ILAENV
should be set with optimal tuning parameters for the machine being used.
The version of ILAENV provided with LAPACK supplies default values
for these parameters that give good, but not optimal, average
case performance on a range of existing machines.
In particular, the performance of xHSEQR is particularly sensitive to
<A NAME="21144"></A><A NAME="21145"></A><A NAME="21146"></A><A NAME="21147"></A>
<A NAME="21148"></A>
the correct choice of block parameters; the same applies to the driver
routines which call xHSEQR, namely xGEES, xGEESX, xGEEV and xGEEVX.
<A NAME="21149"></A><A NAME="21150"></A><A NAME="21151"></A><A NAME="21152"></A>
<A NAME="21153"></A><A NAME="21154"></A><A NAME="21155"></A><A NAME="21156"></A>
<A NAME="21157"></A><A NAME="21158"></A><A NAME="21159"></A><A NAME="21160"></A>
<A NAME="21161"></A><A NAME="21162"></A><A NAME="21163"></A><A NAME="21164"></A>
Further details on setting parameters in ILAENV are found in
section <A HREF="node129.html#chapinstall">6</A>.

<P>
<DT><STRONG>LWORK <IMG
 WIDTH="18" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
 SRC="img913.png"
 ALT="$\geq$">
WORK(1):</STRONG>
<DD>The performance of some routines depends on the amount of workspace
supplied. In such cases,
an argument, usually called WORK, is
provided, accompanied by an integer argument LWORK specifying its
length as a linear array. 
On exit, WORK(1) returns the amount of workspace required to use
the optimal tuning parameters.
If LWORK <B>&lt;</B> WORK(1), then insufficient workspace was provided
to use the optimal parameters, and the performance may be less
than possible.
One should check LWORK <IMG
 WIDTH="18" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
 SRC="img913.png"
 ALT="$\geq$">
WORK(1) on return from
an LAPACK routine requiring user-supplied workspace to see if
enough workspace has been provided.
<A NAME="21166"></A>
Note that the computation is performed correctly, even if the amount of
workspace is less than optimal, unless LWORK is reported as an
invalid value by a call to XERBLA as described in Section&nbsp;<A HREF="node136.html#secfailures">7.3</A>.

<P>
<DT><STRONG>xLAMCH:</STRONG>
<DD>Users should beware of the high cost of the <EM>first</EM>
call to the LAPACK auxiliary routine xLAMCH,
<A NAME="21169"></A>
which computes
machine characteristics such as epsilon and the
smallest invertible number.
The first call dynamically determines a set of parameters defining
the machine's arithmetic, but these values are saved and subsequent
calls incur only a trivial cost.
For performance testing, the initial cost can be hidden by 
including a call to xLAMCH in the main program, before any calls to
LAPACK routines that will be timed.  A sample use of SLAMCH<A NAME="21170"></A> is
<PRE>
      XXXXXX = SLAMCH( 'P' )
</PRE>
or in double precision:
<PRE>
      XXXXXX = DLAMCH( 'P' )
</PRE>
A cleaner but less portable solution is for the installer to
save the values computed by xLAMCH for a specific machine
and create a new version of xLAMCH with these constants set in
DATA statements, taking care that no accuracy is lost in the
translation.
</DL>

<P>

<P>
<HR>
<!--Navigation Panel-->
<A NAME="tex2html6187"
 HREF="node141.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="next_motif.png"></A> 
<A NAME="tex2html6181"
 HREF="node132.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="up_motif.png"></A> 
<A NAME="tex2html6177"
 HREF="node139.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="previous_motif.png"></A> 
<A NAME="tex2html6183"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="contents_motif.png"></A> 
<A NAME="tex2html6185"
 HREF="node152.html">
<IMG WIDTH="43" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="index"
 SRC="index_motif.png"></A> 
<BR>
<B> Next:</B> <A NAME="tex2html6188"
 HREF="node141.html">Index of Driver and</A>
<B> Up:</B> <A NAME="tex2html6182"
 HREF="node132.html">Troubleshooting</A>
<B> Previous:</B> <A NAME="tex2html6178"
 HREF="node139.html">Wrong Results</A>
 &nbsp <B>  <A NAME="tex2html6184"
 HREF="node1.html">Contents</A></B> 
 &nbsp <B>  <A NAME="tex2html6186"
 HREF="node152.html">Index</A></B> 
<!--End of Navigation Panel-->
<ADDRESS>
<I>Susan Blackford</I>
<BR><I>1999-10-01</I>
</ADDRESS>
</BODY>
</HTML>