1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!--Converted with LaTeX2HTML 96.1-h (September 30, 1996) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds -->
<HTML>
<HEAD>
<TITLE>Performance of Selected BLACS and Level 3 BLAS Routines</TITLE>
<META NAME="description" CONTENT="Performance of Selected BLACS and Level 3 BLAS Routines">
<META NAME="keywords" CONTENT="slug">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<LINK REL=STYLESHEET HREF="slug.css">
</HEAD>
<BODY LANG="EN" >
<A NAME="tex2html3637" HREF="node115.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3635" HREF="node108.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3629" HREF="node113.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3639" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3640" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3638" HREF="node115.html">Performance of Selected PBLAS </A>
<B>Up:</B> <A NAME="tex2html3636" HREF="node108.html">PerformancePortability and Scalability</A>
<B> Previous:</B> <A NAME="tex2html3630" HREF="node113.html">ScaLAPACK Performance</A>
<BR> <P>
<H2><A NAME="SECTION04525000000000000000">Performance of Selected BLACS and Level 3 BLAS Routines</A></H2>
<P>
<A NAME="subsecblasblacsperf"> </A>
<P>
The efficiency<A NAME="3799"> </A> of the ScaLAPACK software depends
on efficient implementations of the BLAS and the BLACS being
provided by computer vendors (or others) for their computers.
The BLAS and the BLACS form a low-level interface between
ScaLAPACK software and different computer architectures.
Table <A HREF="node114.html#tabblacs">5.5</A> presents performance numbers indicating
how well the BLACS and Level 3 BLAS perform on different
distributed-memory computers. For each computer this table shows the
flop rate achieved by the matrix-matrix multiply Level 3 BLAS
routine SGEMM/DGEMM (<IMG WIDTH=39 HEIGHT=25 ALIGN=MIDDLE ALT="tex2html_wrap_inline12088" SRC="img9.gif">)<A NAME="7989"> </A> on a node versus the theoretical
peak performance of that node,
the underlying message-passing library
called by the BLACS,
and the approximated values of the latency (<IMG WIDTH=18 HEIGHT=23 ALIGN=MIDDLE ALT="tex2html_wrap_inline12208" SRC="img30.gif">)<A NAME="3803"> </A> and the
bandwidth (<IMG WIDTH=28 HEIGHT=27 ALIGN=MIDDLE ALT="tex2html_wrap_inline12064" SRC="img2.gif">)<A NAME="3804"> </A> achieved by the BLACS versus the
underlying message-passing software for the machine.
<P><A NAME="3806"> </A><A NAME="tabblacs"> </A><IMG WIDTH=681 HEIGHT=201 ALIGN=BOTTOM ALT="table3805" SRC="img386.gif"><BR>
<STRONG>Table 5.5:</STRONG> BLACS and Level 3 BLAS performance indicators<BR>
<P>
<P>
The values for latency in table <A HREF="node114.html#tabblacs">5.5</A> were obtained by timing the cost
of a 0-byte message. The bandwidth numbers table <A HREF="node114.html#tabblacs">5.5</A> were obtained
by increasing message length until message bandwidth was saturated.
We used the same timing mechanism for both the
BLACS and the underlying message-passing library.
<P>
These numbers are actual timing numbers, not values
based on hardware peaks, for instance. Therefore,
they should be considered as approximate values
or indicators of the observed performance between two nodes, as opposed
to precise evaluations of the interconnection network capabilities.
On the CRAY, the numbers reported
are for MPI and the MPIBLACS, instead of the more optimal shmem library with
CRAY's native BLACS.
<P>
For all four computers, a machine-specific optimized BLAS
implementation was used for all the performance numbers reported
in this chapter. For the IBM Scalable POWERparallel 2 (SP2)
computer, the IBM Engineering and Scientific Subroutine Library (ESSL)
was used [<A HREF="node189.html#ibm1">88</A>]. On the Intel XP/S MP Paragon computer, the Intel
Basic Math Library Software (Release 5.0) [<A HREF="node189.html#intel">89</A>] was used.
On the Sun Ultra Enterprise 2
workstation, the Dakota Scientific Software Library (DSSL)<A NAME="tex2html982" HREF="footnode.html#3846"><IMG ALIGN=BOTTOM ALT="gif" SRC="http://www.netlib.org/utk/icons/foot_motif.gif"></A>
was used. The DSSL BLAS implementation used only one processor
per node. The speed of the BLAS matrix-matrix multiply routine shown
in Table <A HREF="node114.html#tabblacs">5.5</A> has been obtained for the following
operation <IMG WIDTH=103 HEIGHT=25 ALIGN=MIDDLE ALT="tex2html_wrap_inline16435" SRC="img387.gif">,
where <I>A</I>, <I>B</I>, and <I>C</I> are square matrices of order 500.
<P>
<BR> <HR>
<UL><A NAME="CHILD_LINKS"> </A>
<LI> <A NAME="tex2html3641" HREF="node115.html#SECTION04525100000000000000">Performance of Selected PBLAS routines</A>
<LI> <A NAME="tex2html3642" HREF="node116.html#SECTION04525200000000000000">Solution of Common Numerical Linear Algebra Problems</A>
</UL>
<HR><A NAME="tex2html3637" HREF="node115.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3635" HREF="node108.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3629" HREF="node113.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3639" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3640" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3638" HREF="node115.html">Performance of Selected PBLAS </A>
<B>Up:</B> <A NAME="tex2html3636" HREF="node108.html">PerformancePortability and Scalability</A>
<B> Previous:</B> <A NAME="tex2html3630" HREF="node113.html">ScaLAPACK Performance</A>
<P><ADDRESS>
<I>Susan Blackford <BR>
Tue May 13 09:21:01 EDT 1997</I>
</ADDRESS>
</BODY>
</HTML>
|