File: GrB_objects_formats.tex

package info (click to toggle)
suitesparse 1%3A7.10.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, trixie
  • size: 254,920 kB
  • sloc: ansic: 1,134,743; cpp: 46,133; makefile: 4,875; fortran: 2,087; java: 1,826; sh: 996; ruby: 725; python: 495; asm: 371; sed: 166; awk: 44
file content (361 lines) | stat: -rw-r--r-- 15,943 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
\newpage
%-------------------------------------------------------------------------------
\subsection{SuiteSparse:GraphBLAS data formats}
%-------------------------------------------------------------------------------
\label{formats}

SuiteSparse:GraphBLAS uses four distinct data formats: sparse, hypersparse,
bitmap, and full, each in row-major or column-major orientations, for eight
different variants (each of which are listed below).
Each of these eight total variants can be iso-valued, where if
\verb'Container->iso' is true the numerical values are all the same, and
\verb'Container->x' holds a single entry with this value.
Each of the sparse and hypersparse formats can appear in {\em jumbled} form,
where the indices within any given row (if the orientation is row-major)
or column may be out of order.  If \verb'Container->jumbled' is false, then
the indices appear in ascending order.

The \verb'p', \verb'h', and \verb'i' vectors in the Container have an integer
type, either \verb'GrB_UINT32' or \verb'GrB_UINT64'.  These appear below as
just \verb'integer', but the actual corresponding C type (\verb'uint32_t' or
\verb'uint64_t') must be used for each component.

%-------------------------------------------------------------------------------
\subsubsection{Sparse, held by row}
%-------------------------------------------------------------------------------
\label{format_sparse_by_row}

A sparse matrix in CSR format, held by row, has a \verb'Container->format'
value of \verb'GxB_SPARSE' and a \verb'Container->orientation' of
\verb'GrB_ROWMAJOR'.  It requires three arrays:

\begin{itemize}
\item \verb'integer Ap [nrows+1] ;'  The \verb'Ap' array is the row
``pointer'' array.  It does not actual contain pointers, but integer offsets.
More precisely, it is an integer array that defines where the column indices
and values appear in \verb'Aj' and \verb'Ax', for each row.  The number of
entries in row \verb'i' is given by the expression \verb'Ap [i+1] - Ap [i]'.

\item \verb'integer Aj [nvals] ;'  The \verb'Aj' array defines the
column indices of entries in each row.

\item \verb'type Ax [nvals] ;'  The \verb'Ax' array defines the values of
entries in each row.  
\end{itemize}

The content of the three arrays \verb'Ap' \verb'Aj', and \verb'Ax' is very
specific.  This content is not checked, since this function takes only
$O(1)$ time.  Results are undefined if the following specification is not
followed exactly.

The column indices of entries in the ith row of the matrix are held in
\verb'Aj [Ap [i] ... Ap[i+1]]', and the corresponding values are held in the
same positions in \verb'Ax'.  Column indices must be in the range 0 to
\verb'ncols'-1.  If \verb'jumbled' is \verb'false', column indices must appear
in ascending order within each row.  If \verb'jumbled' is \verb'true', column
indices may appear in any order within each row.  No duplicate column indices
may appear in any row.  \verb'Ap [0]' must equal zero, and \verb'Ap [nrows]'
must equal \verb'nvals'.  The \verb'Ap' array must be of size \verb'nrows'+1
(or larger), and the \verb'Aj' and \verb'Ax' arrays must have size at least
\verb'nvals'.

An example of the CSR format is shown below.  Consider the following
matrix with 10 nonzero entries, and suppose the zeros are not stored.

    \begin{equation}
    \label{eqn:Aexample}
    A = \left[
    \begin{array}{cccc}
    4.5 &   0 & 3.2 &   0 \\
    3.1 & 2.9 &  0  & 0.9 \\
     0  & 1.7 & 3.0 &   0 \\
    3.5 & 0.4 &  0  & 1.0 \\
    \end{array}
    \right]
    \end{equation}

The \verb'Ap' array has length 5, since the matrix is 4-by-4.  The first entry
must always zero, and \verb'Ap [5] = 10' is the number of entries.
The content of the arrays is shown below:

{\footnotesize
\begin{verbatim}
    integer Ap [ ] = { 0,        2,             5,        7,            10 } ;
    integer Aj [ ] = { 0,   2,   0,   1,   3,   1,   2,   0,   1,   3   } ;
    double  Ax [ ] = { 4.5, 3.2, 3.1, 2.9, 0.9, 1.7, 3.0, 3.5, 0.4, 1.0 } ; \end{verbatim} }

Spaces have been added to the \verb'Ap' array, just for illustration.  Row zero
is in \verb'Aj [0..1]' (column indices) and \verb'Ax [0..1]' (values), starting
at \verb'Ap [0] = 0' and ending at \verb'Ap [0+1]-1 = 1'.  The list of column
indices of row one is at \verb'Aj [2..4]' and row two is in \verb'Aj [5..6]'.
The last row (three) appears \verb'Aj [7..9]', because \verb'Ap [3] = 7' and
\verb'Ap [4]-1 = 10-1 = 9'.  The corresponding numerical values appear in the
same positions in \verb'Ax'.

To iterate over the rows and entries of this matrix, the following code can be
used (assuming it has type \verb'GrB_FP64'):

    {\footnotesize
    \begin{verbatim}
    integer nvals = Ap [nrows] ;
    for (integer i = 0 ; i < nrows ; i++)
    {
        // get A(i,:)
        for (integer p = Ap [i] ; p < Ap [i+1] ; p++)
        {
            // get A(i,j)
            integer  j = Aj [p] ;           // column index
            double aij = Ax [iso ? 0 : p] ;   // numerical value
        }
    } \end{verbatim}}

In the container, the three arrays \verb'Ap', \verb'Aj' and \verb'Ax'
are held in three \verb'GrB_Vector' objects:
\verb'Container->p',
\verb'Container->i', and
\verb'Container->x', respectively.

%-------------------------------------------------------------------------------
\subsubsection{Sparse, held by column}
%-------------------------------------------------------------------------------
\label{format_sparse_by_col}

This format is the transpose of sparse-by-row.  A sparse matrix in CSC format,
held by column, has a \verb'Container->format' value of \verb'GxB_SPARSE' and a
\verb'Container->orientation' of \verb'GrB_COLMAJOR'.  It requires three
arrays: \verb'Ap', \verb'Ai', and \verb'Ax'.

The column ``pointer'' array \verb'Ap' has size \verb'ncols+1'.  The row
indices of the columns are in \verb'Ai', and if \verb'jumbled' is false,
they must appear in ascending order in
each column.  The corresponding numerical values are held in \verb'Ax'.  The
row indices of column \verb'j' are held in \verb'Ai [Ap [j]...Ap [j+1]-1]',
and the corresponding numerical values are in the same locations in \verb'Ax'.

The same matrix from Equation~\ref{eqn:Aexample} in
the last section (repeated here):

    \begin{equation}
    A = \left[
    \begin{array}{cccc}
    4.5 &   0 & 3.2 &   0 \\
    3.1 & 2.9 &  0  & 0.9 \\
     0  & 1.7 & 3.0 &   0 \\
    3.5 & 0.4 &  0  & 1.0 \\
    \end{array}
    \right]
    \end{equation}

is held in CSC form as follows:

{\footnotesize
\begin{verbatim}
    integer Ap [ ] = { 0,             3,             6,        8,       10 } ;
    integer Ai [ ] = { 0,   1,   3,   1,   2,   3,   0,   2,   1,   3   } ;
    double  Ax [ ] = { 4.5, 3.1, 3.5, 2.9, 1.7, 0.4, 3.2, 3.0, 0.9, 1.0 } ; \end{verbatim} }

That is, the row indices of column 1 (the second column) are in
\verb'Ai [3..5]', and the values in the same place in \verb'Ax',
since \verb'Ap [1] = 3' and \verb'Ap [2]-1 = 5'.

To iterate over the columns and entries of this matrix, the following code can
be used (assuming it has type \verb'GrB_FP64'):

    {\footnotesize
    \begin{verbatim}
    integer nvals = Ap [ncols] ;
    for (integer j = 0 ; j < ncols ; j++)
    {
        // get A(:,j)
        for (integer p = Ap [j] ; p < Ap [j+1] ; p++)
        {
            // get A(i,j)
            integer  i = Ai [p] ;             // row index
            double aij = Ax [iso ? 0 : p] ;   // numerical value
        }
    } \end{verbatim}}

In the container, the three arrays \verb'Ap', \verb'Ai' and \verb'Ax'
are held in three \verb'GrB_Vector' objects:
\verb'Container->p',
\verb'Container->i', and
\verb'Container->x', respectively.

%-------------------------------------------------------------------------------
\subsubsection{Hypersparse, held by row}
%-------------------------------------------------------------------------------
\label{format_hypersparse_by_row}

The hypersparse HyperCSR format is identical to the CSR format, except that the
\verb'Ap' array itself becomes sparse, if the matrix has rows that are
completely empty.  An array \verb'Ah' of size \verb'nvec' provides a list of
rows that appear in the data structure.  For example, consider
Equation~\ref{eqn:Ahyper}, which is a sparser version of the matrix in
Equation~\ref{eqn:Aexample}.  Row 2 and column 1 of this matrix are all empty.

    \begin{equation}
    \label{eqn:Ahyper}
    A = \left[
    \begin{array}{cccc}
    4.5 &   0 & 3.2 &   0 \\
    3.1 &   0 &  0  & 0.9 \\
     0  &   0 &  0  &   0 \\
    3.5 &   0 &  0  & 1.0 \\
    \end{array}
    \right]
    \end{equation}

The conventional CSR format would appear as follows.  Since the third row (row
2) is all zero, accessing \verb'Ai [Ap [2] ... Ap [3]-1]' gives an empty set
(\verb'[2..1]'), and the number of entries in this row is
\verb'Ap [i+1] - Ap [i]' \verb'= Ap [3] - Ap [2] = 0'.

{\footnotesize
\begin{verbatim}
    integer Ap [ ] = { 0,        2,2,      4,       5 } ;
    integer Aj [ ] = { 0,   2,   0,   3,   0    3   }
    double  Ax [ ] = { 4.5, 3.2, 3.1, 0.9, 3.5, 1.0 } ; \end{verbatim} }

A hypersparse CSR format for this same matrix would discard
these duplicate integers in \verb'Ap'.  Doing so requires
another array, \verb'Ah', that keeps track of the rows that appear
in the data structure.

{\footnotesize
\begin{verbatim}
    integer nvec = 3 ;
    integer Ah [ ] = { 0,        1,        3        } ;
    integer Ap [ ] = { 0,        2,        4,       5 } ;
    integer Aj [ ] = { 0,   2,   0,   3,   0    3   }
    double  Ax [ ] = { 4.5, 3.2, 3.1, 0.9, 3.5, 1.0 } ; \end{verbatim} }

Note that the \verb'Aj' and \verb'Ax' arrays are the same in the CSR and
HyperCSR formats.  If \verb'jumbled' is false, the row indices in \verb'Ah'
must appear in ascending order, and no duplicates can appear.  To iterate over
this data structure (assuming it has type \verb'GrB_FP64'):

    {\footnotesize
    \begin{verbatim}
    integer nvals = Ap [nvec] ;
    for (integer k = 0 ; k < nvec ; k++)
    {
        integer i = Ah [k] ;                // row index
        // get A(i,:)
        for (integer p = Ap [k] ; p < Ap [k+1] ; p++)
        {
            // get A(i,j)
            integer  j = Aj [p] ;             // column index
            double aij = Ax [iso ? 0 : p] ;   // numerical value
        }
    } \end{verbatim}}

\vspace{-0.05in}
This is more complex than the CSR format, but it requires at most
$O(e)$ space, where $A$ is $m$-by-$n$ with $e$ = \verb'nvals' entries.  The
CSR format requires $O(m+e)$ space.  If $e << m$, then the size $m+1$
of \verb'Ap' can dominate the memory required.  In the hypersparse form,
\verb'Ap' takes on size \verb'nvec+1', and \verb'Ah' has size \verb'nvec',
where \verb'nvec' is the number of rows that appear in the data structure.
The CSR format can be viewed as a dense array (of size \verb'nrows')
of sparse row vectors.   By contrast, the hypersparse CSR format is a sparse
array (of size \verb'nvec') of sparse row vectors.

In the container, the four arrays \verb'Ap', \verb'Ah', \verb'Aj' and \verb'Ax'
are held in four \verb'GrB_Vector' objects:
\verb'Container->p',
\verb'Container->h',
\verb'Container->i', and \newline
\verb'Container->x', respectively.

In addition, the container may hold an optional optimization structure,
\verb'Container->Y', called the hyper-hash.  This is a \verb'GrB_Matrix' that
holds the inverse of the \verb'Container->h' array (called \verb'Y' because it
looks like an upside-down \verb'h').  If a matrix is being loaded from raw
data, the hyper-hash is not yet constructed, so the \verb'Container->Y' matrix
should be set to \verb'NULL'.  GraphBLAS will compute it when needed.

When a matrix is unload into a container, GraphBLAS will place the hyper-hash
matrix there if it has been computed.  If the matrix is subsequently loaded
from the container, and \verb'Container->h' is unchanged, then leaving the
hyper-hash unmodified will preserve this optional optimization data structure.
If instead \verb'Container->h' is revised, the hyper-hash in
\verb'Container->Y' must be freed (or at least removed from the container) when
the matrix is loaded from the container..

A \verb'GrB_Vector' is never held in hypersparse format.

%-------------------------------------------------------------------------------
\subsubsection{Hypersparse, held by column}
%-------------------------------------------------------------------------------
\label{format_hypersparse_by_col}

The hypersparse-by-column format is the transpose of the hypersparse-by-row format.
The \verb'Container->format' is \verb'GxB_HYPERSPARSE' and the \newline
\verb'Container->orientation' is \verb'GrB_COLMAJOR'.
In the container, the four arrays \verb'Ap', \verb'Ah', \verb'Ai' and \verb'Ax'
are held in four \verb'GrB_Vector' objects:
\verb'Container->p',
\verb'Container->h',
\verb'Container->i', and
\verb'Container->x', respectively.
A \verb'GrB_Vector' is never held in hypersparse format.

%-------------------------------------------------------------------------------
\subsubsection{Bitmap, held by row}
%-------------------------------------------------------------------------------
\label{format_bitmap_by_row}

The \verb'Container->format' is \verb'GxB_BITMAP' and the
\verb'Container->orientation' is \verb'GrB_ROWMAJOR'.
This format requires two arrays, \verb'Ab' and \verb'Ax', each of which are
size \verb'nrows*ncols'.  They correspond to \verb'Container->b' and
\verb'Container->x' in the \verb'GxB_Container' object.  These arrays define
the pattern and values of the matrix \verb'A':

\begin{itemize}
\item \verb'int8_t Ab [nrows*ncols] ;'  The \verb'Ab' array defines which
entries of \verb'A' are present.  If \verb'Ab[i*ncols+j]=1', then the entry
$A(i,j)$ is present, with value \verb'Ax[i*ncols+j]'.  If
\verb'Ab[i*ncols+j]=0', then the entry $A(i,j)$ is not present.  The \verb'Ab'
array must contain only 0s and 1s.  The \verb'nvals' input must exactly match
the number of 1s in the \verb'Ab' array.
\item \verb'type Ax [nrows*ncols] ;'  The \verb'Ax' array defines the values of
entries in the matrix.  If \verb'Ab[p]' is zero, the value of \verb'Ax[p]' is
ignored.  If the matrix is iso-valued, \verb'Ax' has size 1.
\end{itemize}

%-------------------------------------------------------------------------------
\subsubsection{Bitmap, held by column}
%-------------------------------------------------------------------------------
\label{format_bitmap_by_col}

This is the transpose of the bitmap-by-row format.
The \verb'Container->format' is \verb'GxB_BITMAP' and the
\verb'Container->orientation' is \verb'GrB_COLMAJOR'.
The value of the entry $A(i,j)$ is held in \verb'Ax [i+j*nrows]', or
in \verb'Ax[0]' if the matrix is iso-valued.
It is present if \verb'Ab [i+j*nrows]' is 1, and not present if zero.

%-------------------------------------------------------------------------------
\subsubsection{Full, held by row}
%-------------------------------------------------------------------------------
\label{format_full_by_row}

The \verb'Container->format' is \verb'GxB_FULL' and the
\verb'Container->orientation' is \verb'GrB_ROWMAJOR'.  This format is held in a
single \verb'GrB_Vector', \verb'Container->x'.  The $A(i,j)$ entry is in
position \verb'i*ncols+j' in this array, or in position 0 if the matrix is
iso-valued.  All entries are present.

%-------------------------------------------------------------------------------
\subsubsection{Full, held by column}
%-------------------------------------------------------------------------------
\label{format_full_by_col}

This is the transpose of the full-by-row format.
The \verb'Container->format' is \verb'GxB_FULL' and the
\verb'Container->orientation' is \verb'GrB_COLMAJOR'.  This format is held in a
single \verb'GrB_Vector', \verb'Container->x'.  The $A(i,j)$ entry is in
position \verb'i+j*nrows' in this array, or in position 0 if the matrix is
iso-valued.  All entries are present.