File: construct.md

package info (click to toggle)
python-sparse 0.17.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 1,816 kB
  • sloc: python: 11,223; sh: 54; javascript: 10; makefile: 8
file content (237 lines) | stat: -rw-r--r-- 7,536 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
# Construct Sparse Arrays

## From coordinates and data

You can construct [`sparse.COO`][] arrays from coordinates and value data.

The `cords` parameter contains the indices where the data is nonzero,
and the `data` parameter contains the data corresponding to those indices.
For example, the following code will generate a $5 \times 5$ diagonal
matrix:

```python

>>> import sparse

>>> coords = [[0, 1, 2, 3, 4],
...           [0, 1, 2, 3, 4]]
>>> data = [10, 20, 30, 40, 50]
>>> s = sparse.COO(coords, data, shape=(5, 5))
>>> s
<COO: shape=(5, 5), dtype=int64, nnz=5, fill_value=0>
     0    1    2    3    4
  ┌                         ┐
0 │ 10                      │
1 │      20                 │
2 │           30            │
3 │                40       │
4 │                     50  │
  └                         ┘
```

In general `coords` should be a `(ndim, nnz)` shaped
array. Each row of `coords` contains one dimension of the
desired sparse array, and each column contains the index
corresponding to that nonzero element. `data` contains
the nonzero elements of the array corresponding to the indices
in `coords`. Its shape should be `(nnz,)`.

If `data` is the same across all the coordinates, it can be passed
in as a scalar. For example, the following produces the $4 \times 4$
identity matrix:

```python

>>> import sparse

>>> coords = [[0, 1, 2, 3],
...           [0, 1, 2, 3]]
>>> data = 1
>>> s = sparse.COO(coords, data, shape=(4, 4))
>>> s
<COO: shape=(4, 4), dtype=int64, nnz=4, fill_value=0>
     0    1    2    3
  ┌                    ┐
0 │  1                 │
1 │       1            │
2 │            1       │
3 │                 1  │
  └                    ┘
```

You can, and should, pass in [`numpy.ndarray`][] objects for
`coords` and `data`.

In this case, the shape of the resulting array was determined from
the maximum index in each dimension. If the array extends beyond
the maximum index in `coords`, you should supply a shape
explicitly. For example, if we did the following without the
`shape` keyword argument, it would result in a
$4 \times 5$ matrix, but maybe we wanted one that was actually
$5 \times 5$.

```python

>>> coords = [[0, 3, 2, 1], [4, 1, 2, 0]]
>>> data = [1, 4, 2, 1]
>>> s = COO(coords, data, shape=(5, 5))
>>> s
<COO: shape=(5, 5), dtype=int64, nnz=4, fill_value=0>
     0    1    2    3    4
  ┌                         ┐
0 │                      1  │
1 │  1                      │
2 │            2            │
3 │       4                 │
4 │                         │
  └                         ┘
```

[`sparse.COO`][] arrays support arbitrary fill values. Fill values are the "default"
value, or value to not store. This can be given a value other than zero. For
example, the following builds a (bad) representation of a $2 \times 2$
identity matrix. Note that not all operations are supported for operations
with nonzero fill values.

```python

>>> coords = [[0, 1], [1, 0]]
>>> data = [0, 0]
>>> s = COO(coords, data, fill_value=1)
>>> s
<COO: shape=(2, 2), dtype=int64, nnz=2, fill_value=1>
     0    1
  ┌          ┐
0 │       0  │
1 │  0       │
  └          ┘
```

## From [`scipy.sparse.spmatrix`][]

To construct [`sparse.COO`][] array from [spmatrix][scipy.sparse.spmatrix]
objects, you can use the [`sparse.COO.from_scipy_sparse`][] method. As an
example, if `x` is a [scipy.sparse.spmatrix][], you can
do the following to get an equivalent [`sparse.COO`][] array:

```python

s = COO.from_scipy_sparse(x)
```

## From [Numpy arrays][`numpy.ndarray`]

To construct [`sparse.COO`][] arrays from [`numpy.ndarray`][]
objects, you can use the [`sparse.COO.from_numpy`][] method. As an
example, if `x` is a [`numpy.ndarray`][], you can
do the following to get an equivalent [`sparse.COO`][] array:

```python

s = COO.from_numpy(x)
```

## Generating random [`sparse.COO`][] objects

The [`sparse.random`][] method can be used to create random
[`sparse.COO`][] arrays. For example, the following will generate
a $10 \times 10$ matrix with $10$ nonzero entries,
each in the interval $[0, 1)$.

```python

s = sparse.random((10, 10), density=0.1)
```

Building [`sparse.COO`][] Arrays from [`sparse.DOK`][] Arrays

It's possible to build [`sparse.COO`][] arrays from [`sparse.DOK`][] arrays, if it is not
easy to construct the `coords` and `data` in a simple way. [`sparse.DOK`][]
arrays provide a simple builder interface to build [`sparse.COO`][] arrays, but at
this time, they can do little else.

You can get started by defining the shape (and optionally, datatype) of the
[`sparse.DOK`][] array. If you do not specify a dtype, it is inferred from the value
dictionary or is set to `dtype('float64')` if that is not present.

```python

s = DOK((6, 5, 2))
s2 = DOK((2, 3, 4), dtype=np.uint8)
```

After this, you can build the array by assigning arrays or scalars to elements
or slices of the original array. Broadcasting rules are followed.

```python

s[1:3, 3:1:-1] = [[6, 5]]
```

DOK arrays also support fancy indexing assignment if and only if all dimensions are indexed.

```python

s[[0, 2], [2, 1], [0, 1]] = 5
s[[0, 3], [0, 4], [0, 1]] = [1, 5]
```

Alongside indexing assignment and retrieval, [`sparse.DOK`][] arrays support any arbitrary broadcasting function
to any number of arguments where the arguments can be [`sparse.SparseArray`][] objects, [`scipy.sparse.spmatrix`][]
objects, or [`numpy.ndarray`][].

```python

x = sparse.random((10, 10), 0.5, format="dok")
y = sparse.random((10, 10), 0.5, format="dok")
sparse.elemwise(np.add, x, y)
```

[`sparse.DOK`][] arrays also support standard ufuncs and operators, including comparison operators,
in combination with other objects implementing the *numpy* *ndarray.\__array_ufunc\__* method. For example,
the following code will perform elementwise equality comparison on the two arrays
and return a new boolean [`sparse.DOK`][] array.

```python

x = sparse.random((10, 10), 0.5, format="dok")
y = np.random.random((10, 10))
x == y
```

[`sparse.DOK`][] arrays are returned from elemwise functions and standard ufuncs if and only if all
[`sparse.SparseArray`][] objects are [`sparse.DOK`][] arrays. Otherwise, a [`sparse.COO`][] array or dense array are returned.

At the end, you can convert the [`sparse.DOK`][] array to a [`sparse.COO`][] arrays.

```python

s3 = COO(s)
```

In addition, it is possible to access single elements and slices of the [`sparse.DOK`][] array
using normal Numpy indexing, as well as fancy indexing if and only if all dimensions are indexed.
Slicing and fancy indexing will always return a new DOK array.

```python

s[1, 2, 1]  # 5
s[5, 1, 1]  # 0
s[[0, 3], [0, 4], [0, 1]] # <DOK: shape=(2,), dtype=float64, nnz=2, fill_value=0.0>
```

## Converting [`sparse.COO`][] objects to other Formats

[`sparse.COO`][] arrays can be converted to [Numpy arrays][numpy.ndarray],
or to some [spmatrix][scipy.sparse.spmatrix] subclasses via the following
methods:

* [`sparse.COO.todense`][]: Converts to a [`numpy.ndarray`][] unconditionally.
* [`sparse.COO.maybe_densify`][]: Converts to a [`numpy.ndarray`][] based on
   certain constraints.
* [`sparse.COO.to_scipy_sparse`][]: Converts to a [`scipy.sparse.coo_matrix`][] if
   the array is two dimensional.
* [`sparse.COO.tocsr`][]: Converts to a [`scipy.sparse.csr_matrix`][] if
   the array is two dimensional.
* [`sparse.COO.tocsc`][]: Converts to a [`scipy.sparse.csc_matrix`][] if
   the array is two dimensional.