1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!--Converted with LaTeX2HTML 96.1-h (September 30, 1996) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds -->
<HTML>
<HEAD>
<TITLE>Performance of Selected PBLAS routines</TITLE>
<META NAME="description" CONTENT="Performance of Selected PBLAS routines">
<META NAME="keywords" CONTENT="slug">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<LINK REL=STYLESHEET HREF="slug.css">
</HEAD>
<BODY LANG="EN" >
<A NAME="tex2html3651" HREF="node116.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3649" HREF="node114.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3643" HREF="node114.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3653" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3654" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3652" HREF="node116.html">Solution of Common Numerical </A>
<B>Up:</B> <A NAME="tex2html3650" HREF="node114.html">Performance of Selected BLACS </A>
<B> Previous:</B> <A NAME="tex2html3644" HREF="node114.html">Performance of Selected BLACS </A>
<BR> <P>
<H3><A NAME="SECTION04525100000000000000">Performance of Selected PBLAS routines</A></H3>
<A NAME="subsecperfpblas"> </A>
<P>
The performance of Level
2 PBLAS routines is dependent on
the performance of Level 2 BLAS routines
which is dependent on the bulk transfer
rate from main memory.
<P><A NAME="3851"> </A><A NAME="tabpblas2"> </A><IMG WIDTH=747 HEIGHT=605 ALIGN=BOTTOM ALT="table3850" SRC="img388.gif"><BR>
<STRONG>Table 5.6:</STRONG> Speed in Mflop/s for the PBLAS matrix-vector
multiply routine PSGEMV/PDGEMV<BR>
<P>
Table <A HREF="node115.html#tabpblas2">5.6</A><A NAME="3872"> </A>
shows execution rates for the 64-bit matrix-vector
multiply PBLAS routine PSGEMV<A NAME="3873"> </A>/PDGEMV<A NAME="3874"> </A>.
The rates listed are for a matrix-vector
product <IMG WIDTH=91 HEIGHT=25 ALIGN=MIDDLE ALT="tex2html_wrap_inline16483" SRC="img389.gif">, where <I>A</I>
is a square matrix of order <I>N</I> and <I>x</I> and
<I>y</I> are vectors that are both distributed
over a process column.
<P>
The Level 3 PBLAS are not necessarily limited
by memory bandwidth because they perform
many flops for each word involved.
The flop rate is correspondingly higher.
Table <A HREF="node115.html#tabpblas3">5.7</A><A NAME="3876"> </A>
<P><A NAME="3878"> </A><A NAME="tabpblas3"> </A><IMG WIDTH=747 HEIGHT=605 ALIGN=BOTTOM ALT="table3877" SRC="img390.gif"><BR>
<STRONG>Table 5.7:</STRONG> Speed in Mflop/s for the PBLAS matrix-matrix
multiply routine PSGEMM/PDGEMM<BR>
<P>
shows the performance
results obtained by
the general matrix-matrix
multiply PBLAS routine
PSGEMM<A NAME="3898"> </A>/PDGEMM<A NAME="3899"> </A>. These
results have been
obtained for the
matrix-matrix
multiply operation
<IMG WIDTH=103 HEIGHT=25 ALIGN=MIDDLE ALT="tex2html_wrap_inline16435" SRC="img387.gif">,
where <I>A</I>, <I>B</I>, and <I>C</I>
are square matrices
of order <I>N</I>.
<P>
<BR> <HR>
<P><ADDRESS>
<I>Susan Blackford <BR>
Tue May 13 09:21:01 EDT 1997</I>
</ADDRESS>
</BODY>
</HTML>
|