File: node121.html

package info (click to toggle)
scalapack-doc 1.5-11
  • links: PTS
  • area: main
  • in suites: bullseye, buster, stretch
  • size: 10,336 kB
  • ctags: 4,931
  • sloc: makefile: 47; sh: 18
file content (55 lines) | stat: -rw-r--r-- 3,508 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!--Converted with LaTeX2HTML 96.1-h (September 30, 1996) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds -->
<HTML>
<HEAD>
<TITLE>Obtaining High Performance with ScaLAPACK Codes</TITLE>
<META NAME="description" CONTENT="Obtaining High Performance with ScaLAPACK Codes">
<META NAME="keywords" CONTENT="slug">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<LINK REL=STYLESHEET HREF="slug.css">
</HEAD>
<BODY LANG="EN" >
 <A NAME="tex2html3725" HREF="node122.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3723" HREF="node120.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3717" HREF="node120.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3727" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3728" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3726" HREF="node122.html">Checking the BLAS and </A>
<B>Up:</B> <A NAME="tex2html3724" HREF="node120.html">Performance Evaluation</A>
<B> Previous:</B> <A NAME="tex2html3718" HREF="node120.html">Performance Evaluation</A>
<BR> <P>
<H2><A NAME="SECTION04531000000000000000">Obtaining High Performance with ScaLAPACK Codes</A></H2>
<P>
We suggest the following approach to obtain high performance with
ScaLAPACK codes:
<UL>
<LI> Use the best BLAS and BLACS
libraries available.
<LI> Start with a standard data distribution.
  <UL>
<LI>A square processor grid (<IMG WIDTH=123 HEIGHT=30 ALIGN=MIDDLE ALT="tex2html_wrap_inline17051" SRC="img413.gif">) if <IMG WIDTH=45 HEIGHT=26 ALIGN=MIDDLE ALT="tex2html_wrap_inline17053" SRC="img414.gif"><A NAME="4159">&#160;</A>
<LI>A one dimensional processor grid (P<IMG WIDTH=6 HEIGHT=7 ALIGN=MIDDLE ALT="tex2html_wrap_inline12112" SRC="img15.gif">=1, P<IMG WIDTH=6 HEIGHT=7 ALIGN=MIDDLE ALT="tex2html_wrap_inline12114" SRC="img16.gif">=P) if <I>P</I> &lt; 9
<LI>Block size = 64<A NAME="4163">&#160;</A>
  </UL>
<LI> Determine whether reasonable performance is being achieved.
<LI> Identify the performance bottleneck(s), if any,
<LI> Tune the distribution or routine parameters to improve
       performance further.
</UL>
<P>
The standard data distribution will typically achieve 25-50% 
of the peak performance possible  (depending 
in part on how many processors are ignored, i.e., the difference
between <IMG WIDTH=38 HEIGHT=30 ALIGN=MIDDLE ALT="tex2html_wrap_inline17061" SRC="img415.gif"> and <IMG WIDTH=27 HEIGHT=32 ALIGN=MIDDLE ALT="tex2html_wrap_inline17063" SRC="img416.gif">).  We do not
recommend experimenting with different data distributions until
performance that is acceptable (or nearly so) has been achieved.
If each individual node requires a block size larger than 64 to 
achieve near-peak performance on local matrix-matrix multiply,
the block size may have to be increased.  This step is unlikely, however, 
unless the computer has a shared-memory multiprocessor with 
more than four processors on each node.
<P>
<BR> <HR>
<P><ADDRESS>
<I>Susan Blackford <BR>
Tue May 13 09:21:01 EDT 1997</I>
</ADDRESS>
</BODY>
</HTML>