1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237
|
# Construct Sparse Arrays
## From coordinates and data
You can construct [`sparse.COO`][] arrays from coordinates and value data.
The `cords` parameter contains the indices where the data is nonzero,
and the `data` parameter contains the data corresponding to those indices.
For example, the following code will generate a $5 \times 5$ diagonal
matrix:
```python
>>> import sparse
>>> coords = [[0, 1, 2, 3, 4],
... [0, 1, 2, 3, 4]]
>>> data = [10, 20, 30, 40, 50]
>>> s = sparse.COO(coords, data, shape=(5, 5))
>>> s
<COO: shape=(5, 5), dtype=int64, nnz=5, fill_value=0>
0 1 2 3 4
┌ ┐
0 │ 10 │
1 │ 20 │
2 │ 30 │
3 │ 40 │
4 │ 50 │
└ ┘
```
In general `coords` should be a `(ndim, nnz)` shaped
array. Each row of `coords` contains one dimension of the
desired sparse array, and each column contains the index
corresponding to that nonzero element. `data` contains
the nonzero elements of the array corresponding to the indices
in `coords`. Its shape should be `(nnz,)`.
If `data` is the same across all the coordinates, it can be passed
in as a scalar. For example, the following produces the $4 \times 4$
identity matrix:
```python
>>> import sparse
>>> coords = [[0, 1, 2, 3],
... [0, 1, 2, 3]]
>>> data = 1
>>> s = sparse.COO(coords, data, shape=(4, 4))
>>> s
<COO: shape=(4, 4), dtype=int64, nnz=4, fill_value=0>
0 1 2 3
┌ ┐
0 │ 1 │
1 │ 1 │
2 │ 1 │
3 │ 1 │
└ ┘
```
You can, and should, pass in [`numpy.ndarray`][] objects for
`coords` and `data`.
In this case, the shape of the resulting array was determined from
the maximum index in each dimension. If the array extends beyond
the maximum index in `coords`, you should supply a shape
explicitly. For example, if we did the following without the
`shape` keyword argument, it would result in a
$4 \times 5$ matrix, but maybe we wanted one that was actually
$5 \times 5$.
```python
>>> coords = [[0, 3, 2, 1], [4, 1, 2, 0]]
>>> data = [1, 4, 2, 1]
>>> s = COO(coords, data, shape=(5, 5))
>>> s
<COO: shape=(5, 5), dtype=int64, nnz=4, fill_value=0>
0 1 2 3 4
┌ ┐
0 │ 1 │
1 │ 1 │
2 │ 2 │
3 │ 4 │
4 │ │
└ ┘
```
[`sparse.COO`][] arrays support arbitrary fill values. Fill values are the "default"
value, or value to not store. This can be given a value other than zero. For
example, the following builds a (bad) representation of a $2 \times 2$
identity matrix. Note that not all operations are supported for operations
with nonzero fill values.
```python
>>> coords = [[0, 1], [1, 0]]
>>> data = [0, 0]
>>> s = COO(coords, data, fill_value=1)
>>> s
<COO: shape=(2, 2), dtype=int64, nnz=2, fill_value=1>
0 1
┌ ┐
0 │ 0 │
1 │ 0 │
└ ┘
```
## From [`scipy.sparse.spmatrix`][]
To construct [`sparse.COO`][] array from [spmatrix][scipy.sparse.spmatrix]
objects, you can use the [`sparse.COO.from_scipy_sparse`][] method. As an
example, if `x` is a [scipy.sparse.spmatrix][], you can
do the following to get an equivalent [`sparse.COO`][] array:
```python
s = COO.from_scipy_sparse(x)
```
## From [Numpy arrays][`numpy.ndarray`]
To construct [`sparse.COO`][] arrays from [`numpy.ndarray`][]
objects, you can use the [`sparse.COO.from_numpy`][] method. As an
example, if `x` is a [`numpy.ndarray`][], you can
do the following to get an equivalent [`sparse.COO`][] array:
```python
s = COO.from_numpy(x)
```
## Generating random [`sparse.COO`][] objects
The [`sparse.random`][] method can be used to create random
[`sparse.COO`][] arrays. For example, the following will generate
a $10 \times 10$ matrix with $10$ nonzero entries,
each in the interval $[0, 1)$.
```python
s = sparse.random((10, 10), density=0.1)
```
Building [`sparse.COO`][] Arrays from [`sparse.DOK`][] Arrays
It's possible to build [`sparse.COO`][] arrays from [`sparse.DOK`][] arrays, if it is not
easy to construct the `coords` and `data` in a simple way. [`sparse.DOK`][]
arrays provide a simple builder interface to build [`sparse.COO`][] arrays, but at
this time, they can do little else.
You can get started by defining the shape (and optionally, datatype) of the
[`sparse.DOK`][] array. If you do not specify a dtype, it is inferred from the value
dictionary or is set to `dtype('float64')` if that is not present.
```python
s = DOK((6, 5, 2))
s2 = DOK((2, 3, 4), dtype=np.uint8)
```
After this, you can build the array by assigning arrays or scalars to elements
or slices of the original array. Broadcasting rules are followed.
```python
s[1:3, 3:1:-1] = [[6, 5]]
```
DOK arrays also support fancy indexing assignment if and only if all dimensions are indexed.
```python
s[[0, 2], [2, 1], [0, 1]] = 5
s[[0, 3], [0, 4], [0, 1]] = [1, 5]
```
Alongside indexing assignment and retrieval, [`sparse.DOK`][] arrays support any arbitrary broadcasting function
to any number of arguments where the arguments can be [`sparse.SparseArray`][] objects, [`scipy.sparse.spmatrix`][]
objects, or [`numpy.ndarray`][].
```python
x = sparse.random((10, 10), 0.5, format="dok")
y = sparse.random((10, 10), 0.5, format="dok")
sparse.elemwise(np.add, x, y)
```
[`sparse.DOK`][] arrays also support standard ufuncs and operators, including comparison operators,
in combination with other objects implementing the *numpy* *ndarray.\__array_ufunc\__* method. For example,
the following code will perform elementwise equality comparison on the two arrays
and return a new boolean [`sparse.DOK`][] array.
```python
x = sparse.random((10, 10), 0.5, format="dok")
y = np.random.random((10, 10))
x == y
```
[`sparse.DOK`][] arrays are returned from elemwise functions and standard ufuncs if and only if all
[`sparse.SparseArray`][] objects are [`sparse.DOK`][] arrays. Otherwise, a [`sparse.COO`][] array or dense array are returned.
At the end, you can convert the [`sparse.DOK`][] array to a [`sparse.COO`][] arrays.
```python
s3 = COO(s)
```
In addition, it is possible to access single elements and slices of the [`sparse.DOK`][] array
using normal Numpy indexing, as well as fancy indexing if and only if all dimensions are indexed.
Slicing and fancy indexing will always return a new DOK array.
```python
s[1, 2, 1] # 5
s[5, 1, 1] # 0
s[[0, 3], [0, 4], [0, 1]] # <DOK: shape=(2,), dtype=float64, nnz=2, fill_value=0.0>
```
## Converting [`sparse.COO`][] objects to other Formats
[`sparse.COO`][] arrays can be converted to [Numpy arrays][numpy.ndarray],
or to some [spmatrix][scipy.sparse.spmatrix] subclasses via the following
methods:
* [`sparse.COO.todense`][]: Converts to a [`numpy.ndarray`][] unconditionally.
* [`sparse.COO.maybe_densify`][]: Converts to a [`numpy.ndarray`][] based on
certain constraints.
* [`sparse.COO.to_scipy_sparse`][]: Converts to a [`scipy.sparse.coo_matrix`][] if
the array is two dimensional.
* [`sparse.COO.tocsr`][]: Converts to a [`scipy.sparse.csr_matrix`][] if
the array is two dimensional.
* [`sparse.COO.tocsc`][]: Converts to a [`scipy.sparse.csc_matrix`][] if
the array is two dimensional.
|