1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!--Converted with LaTeX2HTML 96.1-h (September 30, 1996) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds -->
<HTML>
<HEAD>
<TITLE>ScaLAPACK Performance</TITLE>
<META NAME="description" CONTENT="ScaLAPACK Performance">
<META NAME="keywords" CONTENT="slug">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<LINK REL=STYLESHEET HREF="slug.css">
</HEAD>
<BODY LANG="EN" >
<A NAME="tex2html3625" HREF="node114.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3623" HREF="node108.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3617" HREF="node112.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3627" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3628" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3626" HREF="node114.html">Performance of Selected BLACS </A>
<B>Up:</B> <A NAME="tex2html3624" HREF="node108.html">PerformancePortability and Scalability</A>
<B> Previous:</B> <A NAME="tex2html3618" HREF="node112.html">Parallel Efficiency</A>
<BR> <P>
<H2><A NAME="SECTION04524000000000000000">ScaLAPACK Performance</A></H2>
<A NAME="secscaperf"> </A>
<P>
In this section, we present performance data for Version 1.4
of ScaLAPACK on four distributed memory
computers and two networks of workstations.
The four distributed memory computers are
the Cray T3E computer, the IBM Scalable POWERparallel 2 computer,
the Intel XP/S MP Paragon computer, and the Intel ASCI Option Red
Supercomputer. One of the networks of workstations
consists of Sun Ultra Enterprise 2 (Model 2170s)
connected via switched ATM.
The other network of workstations, the Berkeley NOW [<A HREF="node189.html#berkeleynow">34</A>]<A NAME="3757"> </A>,
consists of 100+ Sun
UltraSPARC-1 workstations and 40+ Myricom crossbar switches and
LANai 4.1 network interface cards.
ScaLAPACK on the NOW uses MPI BLACS, where the MPI is a port of the
freely-available MPICH reference code. MPI uses Active
Messages<A NAME="3758"> </A>
as its underlying communications layer. Active Messages [<A HREF="node189.html#am2">98</A>] provide
ultra-lightweight remote-procedure calls for
processes on the NOW. The system currently uses AM-II<A NAME="3760"> </A>, a generalized
active message layer that supports more than SPMD parallel programs,
e.g., client-server programs and distributed filesystems. It retains
the simple request/response paradigm common to all previous active
message implementations as well as its high-performance.
These six computers are a
collection of processing nodes interconnected via a network. Each node
has local memory and one or more processors.
Tables <A HREF="node113.html#tabnode">5.2</A>, <A HREF="node113.html#tabnode2">5.3</A>, and <A HREF="node113.html#tabnode3">5.4</A> describe
the characteristics of these six computers.
<P>
<P><A NAME="3765"> </A><A NAME="tabnode"> </A><IMG WIDTH=643 HEIGHT=313 ALIGN=BOTTOM ALT="table3764" SRC="img383.gif"><BR>
<STRONG>Table 5.2:</STRONG> Characteristics of the Cray T3E and IBM SP2 computers timed<BR>
<P>
<P>
<P><A NAME="3774"> </A><A NAME="tabnode2"> </A><IMG WIDTH=564 HEIGHT=356 ALIGN=BOTTOM ALT="table3773" SRC="img384.gif"><BR>
<STRONG>Table 5.3:</STRONG> Characteristics of the Intel computers timed<BR>
<P>
<P>
<P><A NAME="3783"> </A><A NAME="tabnode3"> </A><IMG WIDTH=601 HEIGHT=421 ALIGN=BOTTOM ALT="table3782" SRC="img385.gif"><BR>
<STRONG>Table 5.4:</STRONG> Characteristics of the networks of workstations timed<BR>
<P>
<P>
As noted in Tables <A HREF="node113.html#tabnode">5.2</A>, <A HREF="node113.html#tabnode2">5.3</A>, and <A HREF="node113.html#tabnode3">5.4</A>,
a machine-specific optimized BLAS
implementation was used for all the performance numbers reported
in this chapter. For the IBM Scalable POWERparallel 2 (SP2)
computer, the IBM Engineering and Scientific Subroutine Library (ESSL)
was used [<A HREF="node189.html#ibm1">88</A>]. On the Intel XP/S MP Paragon computer, the Intel
Basic Math Library Software (Release 5.0) [<A HREF="node189.html#intel">89</A>] was used.
The Intel ASCI Option Red Supercomputer was tested using
a pre-alpha version of the Cougar operating system and using an
unoptimized functional version of the dual processor Basic Math Library from
Kuck and Associates, Inc. The communication performance and library
performance was still being enhanced.
On the Sun Ultra Enterprise 2
workstation, the Dakota Scientific Software Library (DSSL)<A NAME="tex2html976" HREF="footnode.html#3796"><IMG ALIGN=BOTTOM ALT="gif" SRC="http://www.netlib.org/utk/icons/foot_motif.gif"></A>
was used. The DSSL BLAS implementation used only one processor
per node. On the Berkeley NOW, the Sun Performance Library, version
1.2, was used. It should also be noted that for the IBM Scalable
POWERparallel 2 (SP2) the communication layer used was the IBM
Parallel Operating Environment (POE), which is a combination of
MPI and MPL libraries.
<P>
Several data distributions were tried for <I>N</I>=2000. The fastest
data distribution for <I>N</I>=2000 was used for all problem sizes, although
this data distribution may not be optimal for all problem sizes.
Whenever applicable, only the options UPLO=`U' and TRANS=`N' were timed.
The test matrices were generated with randomly distributed entries.
All runtimes are reported in seconds. Block size is denoted by <I>NB</I>.
<P>
This section first reports performance data for a relevant
selection of BLAS and BLACS routines. Then, timing results
obtained for some PBLAS routines are presented. Finally,
performance numbers for selected ScaLAPACK driver routines are shown.
<P>
<HR><A NAME="tex2html3625" HREF="node114.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3623" HREF="node108.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3617" HREF="node112.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3627" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3628" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3626" HREF="node114.html">Performance of Selected BLACS </A>
<B>Up:</B> <A NAME="tex2html3624" HREF="node108.html">PerformancePortability and Scalability</A>
<B> Previous:</B> <A NAME="tex2html3618" HREF="node112.html">Parallel Efficiency</A>
<P><ADDRESS>
<I>Susan Blackford <BR>
Tue May 13 09:21:01 EDT 1997</I>
</ADDRESS>
</BODY>
</HTML>
|