File: README.md

package info (click to toggle)
python-cymem 2.0.11-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 136 kB
  • sloc: python: 122; sh: 11; makefile: 5
file content (197 lines) | stat: -rw-r--r-- 7,054 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>

# cymem: A Cython Memory Helper

cymem provides two small memory-management helpers for Cython. They make it easy
to tie memory to a Python object's life-cycle, so that the memory is freed when
the object is garbage collected.

[![tests](https://github.com/explosion/cymem/actions/workflows/tests.yml/badge.svg)](https://github.com/explosion/cymem/actions/workflows/tests.yml)
[![pypi Version](https://img.shields.io/pypi/v/cymem.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.python.org/pypi/cymem)
[![conda Version](https://img.shields.io/conda/vn/conda-forge/cymem.svg?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/cymem)
[![Python wheels](https://img.shields.io/badge/wheels-%E2%9C%93-4c1.svg?longCache=true&style=flat-square&logo=python&logoColor=white)](https://github.com/explosion/wheelwright/releases)

## Overview

The most useful is `cymem.Pool`, which acts as a thin wrapper around the calloc
function:

```python
from cymem.cymem cimport Pool
cdef Pool mem = Pool()
data1 = <int*>mem.alloc(10, sizeof(int))
data2 = <float*>mem.alloc(12, sizeof(float))
```

The `Pool` object saves the memory addresses internally, and frees them when the
object is garbage collected. Typically you'll attach the `Pool` to some cdef'd
class. This is particularly handy for deeply nested structs, which have
complicated initialization functions. Just pass the `Pool` object into the
initializer, and you don't have to worry about freeing your struct at all — all
of the calls to `Pool.alloc` will be automatically freed when the `Pool`
expires.

## Installation

Installation is via [pip](https://pypi.python.org/pypi/pip), and requires
[Cython](http://cython.org). Before installing, make sure that your `pip`,
`setuptools` and `wheel` are up to date.

```bash
pip install -U pip setuptools wheel
pip install cymem
```

## Example Use Case: An array of structs

Let's say we want a sequence of sparse matrices. We need fast access, and a
Python list isn't performing well enough. So, we want a C-array or C++ vector,
which means we need the sparse matrix to be a C-level struct — it can't be a
Python class. We can write this easily enough in Cython:

```python
"""Example without Cymem

To use an array of structs, we must carefully walk the data structure when
we deallocate it.
"""

from libc.stdlib cimport calloc, free

cdef struct SparseRow:
    size_t length
    size_t* indices
    double* values

cdef struct SparseMatrix:
    size_t length
    SparseRow* rows

cdef class MatrixArray:
    cdef size_t length
    cdef SparseMatrix** matrices

    def __cinit__(self, list py_matrices):
        self.length = 0
        self.matrices = NULL

    def __init__(self, list py_matrices):
        self.length = len(py_matrices)
        self.matrices = <SparseMatrix**>calloc(len(py_matrices), sizeof(SparseMatrix*))

        for i, py_matrix in enumerate(py_matrices):
            self.matrices[i] = sparse_matrix_init(py_matrix)

    def __dealloc__(self):
        for i in range(self.length):
            sparse_matrix_free(self.matrices[i])
        free(self.matrices)


cdef SparseMatrix* sparse_matrix_init(list py_matrix) except NULL:
    sm = <SparseMatrix*>calloc(1, sizeof(SparseMatrix))
    sm.length = len(py_matrix)
    sm.rows = <SparseRow*>calloc(sm.length, sizeof(SparseRow))
    cdef size_t i, j
    cdef dict py_row
    cdef size_t idx
    cdef double value
    for i, py_row in enumerate(py_matrix):
        sm.rows[i].length = len(py_row)
        sm.rows[i].indices = <size_t*>calloc(sm.rows[i].length, sizeof(size_t))
        sm.rows[i].values = <double*>calloc(sm.rows[i].length, sizeof(double))
        for j, (idx, value) in enumerate(py_row.items()):
            sm.rows[i].indices[j] = idx
            sm.rows[i].values[j] = value
    return sm


cdef void* sparse_matrix_free(SparseMatrix* sm) except *:
    cdef size_t i
    for i in range(sm.length):
        free(sm.rows[i].indices)
        free(sm.rows[i].values)
    free(sm.rows)
    free(sm)
```

We wrap the data structure in a Python ref-counted class at as low a level as we
can, given our performance constraints. This allows us to allocate and free the
memory in the `__cinit__` and `__dealloc__` Cython special methods.

However, it's very easy to make mistakes when writing the `__dealloc__` and
`sparse_matrix_free` functions, leading to memory leaks. cymem prevents you from
writing these deallocators at all. Instead, you write as follows:

```python
"""Example with Cymem.

Memory allocation is hidden behind the Pool class, which remembers the
addresses it gives out.  When the Pool object is garbage collected, all of
its addresses are freed.

We don't need to write MatrixArray.__dealloc__ or sparse_matrix_free,
eliminating a common class of bugs.
"""
from cymem.cymem cimport Pool

cdef struct SparseRow:
    size_t length
    size_t* indices
    double* values

cdef struct SparseMatrix:
    size_t length
    SparseRow* rows


cdef class MatrixArray:
    cdef size_t length
    cdef SparseMatrix** matrices
    cdef Pool mem

    def __cinit__(self, list py_matrices):
        self.mem = None
        self.length = 0
        self.matrices = NULL

    def __init__(self, list py_matrices):
        self.mem = Pool()
        self.length = len(py_matrices)
        self.matrices = <SparseMatrix**>self.mem.alloc(self.length, sizeof(SparseMatrix*))
        for i, py_matrix in enumerate(py_matrices):
            self.matrices[i] = sparse_matrix_init(self.mem, py_matrix)

cdef SparseMatrix* sparse_matrix_init_cymem(Pool mem, list py_matrix) except NULL:
    sm = <SparseMatrix*>mem.alloc(1, sizeof(SparseMatrix))
    sm.length = len(py_matrix)
    sm.rows = <SparseRow*>mem.alloc(sm.length, sizeof(SparseRow))
    cdef size_t i, j
    cdef dict py_row
    cdef size_t idx
    cdef double value
    for i, py_row in enumerate(py_matrix):
        sm.rows[i].length = len(py_row)
        sm.rows[i].indices = <size_t*>mem.alloc(sm.rows[i].length, sizeof(size_t))
        sm.rows[i].values = <double*>mem.alloc(sm.rows[i].length, sizeof(double))
        for j, (idx, value) in enumerate(py_row.items()):
            sm.rows[i].indices[j] = idx
            sm.rows[i].values[j] = value
    return sm
```

All that the `Pool` class does is remember the addresses it gives out. When the
`MatrixArray` object is garbage-collected, the `Pool` object will also be
garbage collected, which triggers a call to `Pool.__dealloc__`. The `Pool` then
frees all of its addresses. This saves you from walking back over your nested
data structures to free them, eliminating a common class of errors.

## Custom Allocators

Sometimes external C libraries use private functions to allocate and free
objects, but we'd still like the laziness of the `Pool`.

```python
from cymem.cymem cimport Pool, WrapMalloc, WrapFree
cdef Pool mem = Pool(WrapMalloc(priv_malloc), WrapFree(priv_free))
```