File: node76.html

package info (click to toggle)
scalapack-doc 1.5-11
  • links: PTS
  • area: main
  • in suites: bullseye, buster, stretch
  • size: 10,336 kB
  • ctags: 4,931
  • sloc: makefile: 47; sh: 18
file content (287 lines) | stat: -rw-r--r-- 15,048 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!--Converted with LaTeX2HTML 96.1-h (September 30, 1996) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds -->
<HTML>
<HEAD>
<TITLE>Local Storage Scheme and Block-Cyclic Mapping</TITLE>
<META NAME="description" CONTENT="Local Storage Scheme and Block-Cyclic Mapping">
<META NAME="keywords" CONTENT="slug">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<LINK REL=STYLESHEET HREF="slug.css">
</HEAD>
<BODY LANG="EN" >
 <A NAME="tex2html3134" HREF="node77.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3132" HREF="node74.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3126" HREF="node75.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3136" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3137" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3135" HREF="node77.html">Array Descriptor for In-core </A>
<B>Up:</B> <A NAME="tex2html3133" HREF="node74.html">In-core Dense Matrices</A>
<B> Previous:</B> <A NAME="tex2html3127" HREF="node75.html">The Two-dimensional Block-Cyclic Distribution</A>
<BR> <P>
<H2><A NAME="SECTION04432000000000000000">Local Storage Scheme and Block-Cyclic Mapping</A></H2>
<P>
                                                  <A NAME="seclocalstorage">&#160;</A>
<P>
The block-cyclic distribution scheme
is a mapping of a set of blocks onto
the processes. The previous section
informally described this mapping as
well as some of its properties. To
be complete, we must now explain how 
the blocks that are mapped to the 
same process are arranged and 
stored in the local process memory.
In other words, we shall describe
the precise mapping that associates
to a matrix entry identified by its
global indexes the coordinates of
the process that owns it and its 
local position within that process's
memory.
<P>
Suppose we have an array of length
<I>N</I> to be stored on <I>P</I> processes.
By convention, the array entries
are numbered 1 through <I>N</I> and
the processes are numbered 0
through <I>P</I>-1. First, the array
is divided into contiguous blocks
of size <I>NB</I>. When <I>NB</I> does not
divide <I>N</I> evenly, the last block
of array elements will only contain
<IMG WIDTH=98 HEIGHT=26 ALIGN=MIDDLE ALT="tex2html_wrap_inline14699" SRC="img271.gif"> entries instead
of <I>NB</I>. By convention, these blocks
are numbered starting from zero and
dealt out to the processes like a 
deck of cards. In other words, if
we assume that the process 0
receives the first block, the <IMG WIDTH=20 HEIGHT=15 ALIGN=BOTTOM ALT="tex2html_wrap_inline14705" SRC="img272.gif">
block is assigned to the process of
coordinate <IMG WIDTH=76 HEIGHT=26 ALIGN=MIDDLE ALT="tex2html_wrap_inline14707" SRC="img273.gif">. The
blocks assigned to the same process
are stored contiguously in memory.
The mapping of an array entry 
globally indexed by <I>I</I> is defined
by the following analytical equation:
<BR><IMG WIDTH=391 HEIGHT=17 ALIGN=BOTTOM ALT="displaymath14669" SRC="img274.gif"><BR>
where <I>I</I> is a global index in the
array, <I>l</I> is the local block coordinate
into which this entry resides, <I>p</I>
is the coordinate of the process
owning that block, and finally <I>x</I>
is the coordinate within that block
where the global array entry of
index <I>I</I> is to be found. It is
then fairly easy to establish the
analytical relationship between
these variables. One obtains:
<BR><A NAME="eqnindexmap">&#160;</A><IMG WIDTH=704 HEIGHT=18 ALIGN=BOTTOM ALT="equation2409" SRC="img275.gif"><BR>
These equations allow to determine
the local information, i.e. the
local index <IMG WIDTH=82 HEIGHT=25 ALIGN=MIDDLE ALT="tex2html_wrap_inline14721" SRC="img276.gif"> as
well as the process coordinate
<I>p</I> corresponding to a global entry
identified by its global index <I>I</I>
and conversely.  Table&nbsp;<A HREF="node76.html#tabbmap">4.3</A>
illustrates this mapping for the block
layout when <I>P</I>=2 and <I>N</I>=16, i.e.,
<I>NB</I>=8.  At most one block is assigned
to each process.
<P><A NAME="2417">&#160;</A><A NAME="tabbmap">&#160;</A><IMG WIDTH=616 HEIGHT=111 ALIGN=BOTTOM ALT="table2416" SRC="img277.gif"><BR>
<STRONG>Table 4.3:</STRONG> One-dimensional block mapping example for <I>P</I>=2 and <I>N</I>=16<BR>
<P>
<P>
This example of the one-dimensional
block distribution mapping can be
expressed in HPF<A NAME="2425">&#160;</A>
by using the following statements:
<PRE>      REAL :: X( N )
!HPF$ PROCESSORS PROC( P )
!HPF$ DISTRIBUTE X( BLOCK( NB ) ) ONTO PROC</PRE>
<P>
Table&nbsp;<A HREF="node76.html#tabcmap">4.4</A> illustrates
Equation&nbsp;<A HREF="node76.html#eqnindexmap">4.1</A> for 
the cyclic layout, i.e., <I>NB</I>=1
when <I>P</I>=2 and <I>N</I>=16.
<P><A NAME="2429">&#160;</A><A NAME="tabcmap">&#160;</A><IMG WIDTH=616 HEIGHT=111 ALIGN=BOTTOM ALT="table2428" SRC="img278.gif"><BR>
<STRONG>Table 4.4:</STRONG> One-dimensional cyclic mapping example for <I>P</I>=2
         and <I>N</I>=16<BR>
<P>
<P>
This example of the one-dimensional cyclic
distribution mapping can be expressed in
HPF<A NAME="2437">&#160;</A> by using the
following statements:
<PRE>      REAL :: X( N )
!HPF$ PROCESSORS PROC( P )
!HPF$ DISTRIBUTE X( CYCLIC ) ONTO PROC</PRE>
<P>
Table&nbsp;<A HREF="node76.html#tabbcmap">4.5</A> illustrates
Equation&nbsp;<A HREF="node76.html#eqnindexmap">4.1</A> for
the block-cyclic layout when <I>P</I>=2,
<I>NB</I>=3 and <I>N</I>=16.
<P><A NAME="2441">&#160;</A><A NAME="tabbcmap">&#160;</A><IMG WIDTH=616 HEIGHT=111 ALIGN=BOTTOM ALT="table2440" SRC="img279.gif"><BR>
<STRONG>Table 4.5:</STRONG> One-dimensional block-cyclic mapping example for
          <I>P</I>=2, <I>NB</I>=3 and <I>N</I>=16<BR>
<P>
<P>
This example of the one-dimensional cyclic
distribution mapping can be expressed in
HPF<A NAME="2449">&#160;</A> by using the
following statements:
<PRE>      REAL :: X( N )
!HPF$ PROCESSORS PROC( P )
!HPF$ DISTRIBUTE X( CYCLIC( NB ) ) ONTO PROC</PRE>
<P>
There is in fact no real
reason to always deal out
the blocks starting with
the process 0. In fact,
it is sometimes useful to
start the data distribution
with the process of arbitrary
coordinate SRC, in which case
Equation&nbsp;<A HREF="node76.html#eqnindexmap">4.1</A>
becomes:
<BR><A NAME="eqnindexmap1">&#160;</A><IMG WIDTH=522 HEIGHT=67 ALIGN=BOTTOM ALT="equation2451" SRC="img280.gif"><BR>
<P>
Table&nbsp;<A HREF="node76.html#tabbcmap0">4.6</A> 
illustrates Equation&nbsp;<A HREF="node76.html#eqnindexmap1">4.2</A>
for the block-cyclic layout when <IMG WIDTH=56 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline14791" SRC="img281.gif">, 
<IMG WIDTH=79 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline14793" SRC="img282.gif">, <IMG WIDTH=73 HEIGHT=13 ALIGN=BOTTOM ALT="tex2html_wrap_inline14795" SRC="img283.gif"> and <IMG WIDTH=67 HEIGHT=13 ALIGN=BOTTOM ALT="tex2html_wrap_inline14797" SRC="img284.gif">.
<P><A NAME="2462">&#160;</A><A NAME="tabbcmap0">&#160;</A><IMG WIDTH=616 HEIGHT=111 ALIGN=BOTTOM ALT="table2461" SRC="img285.gif"><BR>
<STRONG>Table 4.6:</STRONG> One-dimensional block-cyclic mapping example for
          <I>P</I>=2, <I>SRC</I>=1, <I>NB</I>=3 and <I>N</I>=16<BR>
<P>
This example
of the one-dimensional
block-cyclic distribution
mapping can be expressed
in HPF<A NAME="2470">&#160;</A>
by using the following statements:
<PRE>      REAL :: X( N )
!HPF$ PROCESSORS PROC( P )
!HPF$ TEMPLATE T( N + P*NB )
!HPF$ DISTRIBUTE T( CYCLIC( NB ) ) ONTO PROC
!HPF$ ALIGN X( I ) WITH T( SRC*NB + I )</PRE>
<P>
In the two-dimensional case,
assuming the matrix is partitioned
in <IMG WIDTH=83 HEIGHT=22 ALIGN=MIDDLE ALT="tex2html_wrap_inline14817" SRC="img286.gif"> blocks and that
the first block is given to the 
process of coordinates (<I>RSRC</I>, <I>CSRC</I>),
the analytical formula given above for
the one-dimensional case are simply 
reused independently in each dimension
of the <IMG WIDTH=57 HEIGHT=25 ALIGN=MIDDLE ALT="tex2html_wrap_inline12182" SRC="img25.gif"> process grid.
For example, the matrix entry (<I>I</I>,<I>J</I>)
is thus to be found in the process
of coordinates <IMG WIDTH=50 HEIGHT=26 ALIGN=MIDDLE ALT="tex2html_wrap_inline14629" SRC="img265.gif"> within 
the local (<I>l</I>,<I>m</I>) block at the 
position (<I>x</I>,<I>y</I>) given by:
<BR><IMG WIDTH=660 HEIGHT=67 ALIGN=BOTTOM ALT="displaymath14670" SRC="img287.gif"><BR>
<P>
These formula specify how an <IMG WIDTH=38 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline14831" SRC="img288.gif">
by <IMG WIDTH=35 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline14833" SRC="img289.gif"><A NAME="2480">&#160;</A>
matrix <I>A</I> is mapped and stored on the 
process grid. It is first decomposed
into <IMG WIDTH=52 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline14837" SRC="img290.gif"> by <IMG WIDTH=49 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline14839" SRC="img291.gif"><A NAME="2481">&#160;</A>
blocks starting at its upper left
corner.  These blocks are then 
uniformly distributed across the
process grid in a cyclic manner.
<P>
Every process owns a collection
of blocks, which are contiguously
stored by column in a two-dimensional
``column major'' array.
<P>
This local storage convention
allows the ScaLAPACK software to
use efficiently the local memory
hierarchy by calling the BLAS on
subarrays that may be larger than
a single <IMG WIDTH=52 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline14837" SRC="img290.gif"> by <IMG WIDTH=49 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline14839" SRC="img291.gif"> block.
We present in figure&nbsp;<A HREF="node76.html#figmat5">4.5</A><A NAME="2483">&#160;</A>
the mapping of a <TT>5</TT><IMG WIDTH=9 HEIGHT=16 ALIGN=MIDDLE ALT="tex2html_wrap_inline14845" SRC="img292.gif"><TT>5</TT>
matrix partitioned into <TT>2</TT><IMG WIDTH=9 HEIGHT=16 ALIGN=MIDDLE ALT="tex2html_wrap_inline14845" SRC="img292.gif"><TT>2</TT>
blocks mapped onto a <TT>2</TT><IMG WIDTH=9 HEIGHT=16 ALIGN=MIDDLE ALT="tex2html_wrap_inline14845" SRC="img292.gif"><TT>2</TT>
process grid (i.e., <IMG WIDTH=128 HEIGHT=13 ALIGN=BOTTOM ALT="tex2html_wrap_inline14851" SRC="img293.gif">, <IMG WIDTH=91 HEIGHT=25 ALIGN=MIDDLE ALT="tex2html_wrap_inline14651" SRC="img270.gif">,
and <IMG WIDTH=156 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline14855" SRC="img294.gif">).  The local entries of 
every matrix column are contiguously stored
in the processes' memories.
<P>
<P><A NAME="7961">&#160;</A><IMG WIDTH=460 HEIGHT=199 ALIGN=BOTTOM ALT="figure2490" SRC="img297.gif"><BR>
<STRONG>Figure 4.6:</STRONG> A <IMG WIDTH=37 HEIGHT=24 ALIGN=MIDDLE ALT="tex2html_wrap_inline14857" SRC="img295.gif"> matrix decomposed into <IMG WIDTH=37 HEIGHT=22 ALIGN=MIDDLE ALT="tex2html_wrap_inline14859" SRC="img296.gif">  <A NAME="figmat5">&#160;</A>
 blocks mapped onto a <IMG WIDTH=37 HEIGHT=22 ALIGN=MIDDLE ALT="tex2html_wrap_inline14859" SRC="img296.gif"> process grid<BR>
<P>
<P>
In figure&nbsp;<A HREF="node76.html#figmat5">4.5</A>, the process
of coordinates (0,0) owns four blocks.
The matrix entries of the global columns
1, 2 and 5 are contiguously stored in
that process's memory. Finally, these
columns are themselves continuously stored
forming a conventional two-dimensional
local array.  In that local array <I>A</I>,
the entry <I>A</I>(2,3) contains the value
of the global matrix entry <IMG WIDTH=21 HEIGHT=17 ALIGN=MIDDLE ALT="tex2html_wrap_inline14869" SRC="img298.gif">.
This example would be expressed in 
HPF<A NAME="2496">&#160;</A> as:
<PRE>      REAL :: A( 5, 5 )
!HPF$ PROCESSORS PROC( 2, 2 )
!HPF$ DISTRIBUTE A( CYCLIC( 2 ), CYCLIC( 2 ) ) ONTO PROC</PRE>
<P>
Determining the number of
<A NAME="2497">&#160;</A>
<A NAME="2498">&#160;</A>
rows or columns of a global
dense matrix that a specific
process receives is an 
essential task for the
user. ScaLAPACK provides
a tool routine, NUMROC,
to perform this function.
The notation LOC<IMG WIDTH=6 HEIGHT=7 ALIGN=MIDDLE ALT="tex2html_wrap_inline12112" SRC="img15.gif">()
and LOC<IMG WIDTH=6 HEIGHT=7 ALIGN=MIDDLE ALT="tex2html_wrap_inline12114" SRC="img16.gif">() is used
to reflect these local
quantities throughout the
leading comments of the
source code and is reflected
in the sample argument 
description in section&nbsp;<A HREF="node79.html#subsecargdesc">4.3.5</A>.
The values of LOC<IMG WIDTH=6 HEIGHT=7 ALIGN=MIDDLE ALT="tex2html_wrap_inline12112" SRC="img15.gif">()<A NAME="7962">&#160;</A>
and LOC<IMG WIDTH=6 HEIGHT=7 ALIGN=MIDDLE ALT="tex2html_wrap_inline12114" SRC="img16.gif">()<A NAME="7963">&#160;</A> computed
by NUMROC are precise calculations.
<P>
However, if users want a
general idea of the size
of a local array, they can
perform the following ``back
of the envelope'' calculation 
to receive an upper bound on
the quantity.
<P>
An upper bound on the value of LOC<IMG WIDTH=6 HEIGHT=7 ALIGN=MIDDLE ALT="tex2html_wrap_inline12112" SRC="img15.gif">() can be calculated as:
<BR><IMG WIDTH=434 HEIGHT=57 ALIGN=BOTTOM ALT="displaymath14671" SRC="img299.gif"><BR>
or equivalently as
<BR><IMG WIDTH=406 HEIGHT=18 ALIGN=BOTTOM ALT="displaymath14672" SRC="img300.gif"><BR>
<P>
Similarly, an upper bound on the value of
LOC<IMG WIDTH=6 HEIGHT=7 ALIGN=MIDDLE ALT="tex2html_wrap_inline12114" SRC="img16.gif">() can be calculated as
<BR><IMG WIDTH=429 HEIGHT=57 ALIGN=BOTTOM ALT="displaymath14673" SRC="img301.gif"><BR>
or equivalently as
<BR><IMG WIDTH=401 HEIGHT=18 ALIGN=BOTTOM ALT="displaymath14674" SRC="img302.gif"><BR>
<P>
Note that this calculation can 
yield a gross overestimate of
the amount of space actually
required.
<P>
<HR><A NAME="tex2html3134" HREF="node77.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="http://www.netlib.org/utk/icons/next_motif.gif"></A> <A NAME="tex2html3132" HREF="node74.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="http://www.netlib.org/utk/icons/up_motif.gif"></A> <A NAME="tex2html3126" HREF="node75.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="http://www.netlib.org/utk/icons/previous_motif.gif"></A> <A NAME="tex2html3136" HREF="node1.html"><IMG WIDTH=65 HEIGHT=24 ALIGN=BOTTOM ALT="contents" SRC="http://www.netlib.org/utk/icons/contents_motif.gif"></A> <A NAME="tex2html3137" HREF="node190.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="http://www.netlib.org/utk/icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME="tex2html3135" HREF="node77.html">Array Descriptor for In-core </A>
<B>Up:</B> <A NAME="tex2html3133" HREF="node74.html">In-core Dense Matrices</A>
<B> Previous:</B> <A NAME="tex2html3127" HREF="node75.html">The Two-dimensional Block-Cyclic Distribution</A>
<P><ADDRESS>
<I>Susan Blackford <BR>
Tue May 13 09:21:01 EDT 1997</I>
</ADDRESS>
</BODY>
</HTML>