File: index.rst

package info (click to toggle)
python-sparse 0.16.0a9-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,948 kB
  • sloc: python: 9,959; makefile: 8; sh: 3
file content (111 lines) | stat: -rw-r--r-- 4,100 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
Sparse
======

.. raw:: html
	:file: logo.svg

This implements sparse arrays of arbitrary dimension on top of :obj:`numpy` and :obj:`scipy.sparse`.
It generalizes the :obj:`scipy.sparse.coo_matrix` and :obj:`scipy.sparse.dok_matrix` layouts,
but extends beyond just rows and columns to an arbitrary number of dimensions.

Additionally, this project maintains compatibility with the :obj:`numpy.ndarray` interface
rather than the :obj:`numpy.matrix` interface used in :obj:`scipy.sparse`

These differences make this project useful in certain situations
where scipy.sparse matrices are not well suited,
but it should not be considered a full replacement.
The data structures in pydata/sparse complement and can
be used in conjunction with the fast linear algebra routines
inside scipy.sparse. A format conversion or copy may be required.


Motivation
----------

Sparse arrays, or arrays that are mostly empty or filled with zeros,
are common in many scientific applications.
To save space we often avoid storing these arrays in traditional dense formats,
and instead choose different data structures.
Our choice of data structure can significantly affect our storage and computational
costs when working with these arrays.


Design
------

The main data structure in this library follows the
`Coordinate List (COO) <https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)>`_
layout for sparse matrices, but extends it to multiple dimensions.

The COO layout, which stores the row index, column index, and value of every element:

=== === ====
row col data
=== === ====
  0   0   10
  0   2   13
  1   3    9
  3   8   21
=== === ====

It is straightforward to extend the COO layout to an arbitrary number of
dimensions:

==== ==== ==== === ====
dim1 dim2 dim3 ... data
==== ==== ==== === ====
  0    0     0   .   10
  0    0     3   .   13
  0    2     2   .    9
  3    1     4   .   21
==== ==== ==== === ====

This makes it easy to *store* a multidimensional sparse array, but we still
need to reimplement all of the array operations like transpose, reshape,
slicing, tensordot, reductions, etc., which can be challenging in general.

This library also includes several other data structures. Similar to COO,
the `Dictionary of Keys (DOK) <https://en.wikipedia.org/wiki/Sparse_matrix#Dictionary_of_keys_(DOK)>`_
format for sparse matrices generalizes well to an arbitrary number of dimensions.
DOK is well-suited for writing and mutating. Most other operations are not supported for DOK.
A common workflow may involve writing an array with DOK and then converting to another
format for other operations.

The `Compressed Sparse Row/Column (CSR/CSC) <https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_(CSC_or_CCS)>`_
formats are widely used in scientific computing are now supported
by pydata/sparse. The CSR/CSC formats excel at compression and mathematical operations.
While these formats are restricted to two dimensions, pydata/sparse supports
the GCXS sparse array format, based on
`GCRS/GCCS from <https://ieeexplore.ieee.org/abstract/document/7237032/similar#similar>`_
which generalizes CSR/CSC to n-dimensional arrays.
Like their two-dimensional CSR/CSC counterparts, GCXS arrays compress well.
Whereas the storage cost of COO depends heavily on the number of dimensions of the array,
the number of dimensions only minimally affects the storage cost of GCXS arrays,
which results in favorable compression ratios across many use cases.

Together these formats cover a wide array of applications of sparsity.
Additionally, with each format complying with the :obj:`numpy.ndarray` interface and
following the appropriate dispatching protocols,
pydata/sparse arrays can interact with other array libraries and seamlessly
take part in pydata-ecosystem-based workflows.

LICENSE
-------

This library is licensed under BSD-3

.. toctree::
   :maxdepth: 3
   :hidden:

   install
   quickstart
   construct
   operations
   generated/sparse
   roadmap
   contributing
   changelog
   conduct

.. _scipy.sparse: https://docs.scipy.org/doc/scipy/reference/sparse.html