1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869
|
:PEP: 3118
:Title: Revising the buffer protocol
:Version: $Revision$
:Last-Modified: $Date$
:Authors: Travis Oliphant <oliphant@ee.byu.edu>, Carl Banks <pythondev@aerojockey.com>
:Status: Draft
:Type: Standards Track
:Content-Type: text/x-rst
:Created: 28-Aug-2006
:Python-Version: 3000
Abstract
========
This PEP proposes re-designing the buffer interface (PyBufferProcs
function pointers) to improve the way Python allows memory sharing
in Python 3.0
In particular, it is proposed that the character buffer portion
of the API be elminated and the multiple-segment portion be
re-designed in conjunction with allowing for strided memory
to be shared. In addition, the new buffer interface will
allow the sharing of any multi-dimensional nature of the
memory and what data-format the memory contains.
This interface will allow any extension module to either
create objects that share memory or create algorithms that
use and manipulate raw memory from arbitrary objects that
export the interface.
Rationale
=========
The Python 2.X buffer protocol allows different Python types to
exchange a pointer to a sequence of internal buffers. This
functionality is *extremely* useful for sharing large segments of
memory between different high-level objects, but it is too limited and
has issues:
1. There is the little used "sequence-of-segments" option
(bf_getsegcount) that is not well motivated.
2. There is the apparently redundant character-buffer option
(bf_getcharbuffer)
3. There is no way for a consumer to tell the buffer-API-exporting
object it is "finished" with its view of the memory and
therefore no way for the exporting object to be sure that it is
safe to reallocate the pointer to the memory that it owns (for
example, the array object reallocating its memory after sharing
it with the buffer object which held the original pointer led
to the infamous buffer-object problem).
4. Memory is just a pointer with a length. There is no way to
describe what is "in" the memory (float, int, C-structure, etc.)
5. There is no shape information provided for the memory. But,
several array-like Python types could make use of a standard
way to describe the shape-interpretation of the memory
(wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
Libraries, ctypes, NumPy, data-base interfaces, etc.)
6. There is no way to share discontiguous memory (except through
the sequence of segments notion).
There are two widely used libraries that use the concept of
discontiguous memory: PIL and NumPy. Their view of discontiguous
arrays is different, though. The proposed buffer interface allows
sharing of either memory model. Exporters will use only one
approach and consumers may choose to support discontiguous
arrays of each type however they choose.
NumPy uses the notion of constant striding in each dimension as its
basic concept of an array. With this concept, a simple sub-region
of a larger array can be described without copying the data.
Thus, stride information is the additional information that must be
shared.
The PIL uses a more opaque memory representation. Sometimes an
image is contained in a contiguous segment of memory, but sometimes
it is contained in an array of pointers to the contiguous segments
(usually lines) of the image. The PIL is where the idea of multiple
buffer segments in the original buffer interface came from.
NumPy's strided memory model is used more often in computational
libraries and because it is so simple it makes sense to support
memory sharing using this model. The PIL memory model is sometimes
used in C-code where a 2-d array can be then accessed using double
pointer indirection: e.g. image[i][j].
The buffer interface should allow the object to export either of these
memory models. Consumers are free to either require contiguous memory
or write code to handle one or both of these memory models.
Proposal Overview
=================
* Eliminate the char-buffer and multiple-segment sections of the
buffer-protocol.
* Unify the read/write versions of getting the buffer.
* Add a new function to the interface that should be called when
the consumer object is "done" with the memory area.
* Add a new variable to allow the interface to describe what is in
memory (unifying what is currently done now in struct and
array)
* Add a new variable to allow the protocol to share shape information
* Add a new variable for sharing stride information
* Add a new mechanism for sharing arrays that must
be accessed using pointer indirection.
* Fix all objects in the core and the standard library to conform
to the new interface
* Extend the struct module to handle more format specifiers
* Extend the buffer object into a new memory object which places
a Python veneer around the buffer interface.
* Add a few functions to make it easy to copy contiguous data
in and out of object supporting the buffer interface.
Specification
=============
While the new specification allows for complicated memory sharing.
Simple contiguous buffers of bytes can still be obtained from an
object. In fact, the new protocol allows a standard mechanism for
doing this even if the original object is not represented as a
contiguous chunk of memory.
The easiest way to obtain a simple contiguous chunk of memory is
to use the provided C-API to obtain a chunk of memory.
Change the PyBufferProcs structure to
::
typedef struct {
getbufferproc bf_getbuffer;
releasebufferproc bf_releasebuffer;
}
::
typedef int (*getbufferproc)(PyObject *obj, PyBuffer *view, int flags)
This function returns 0 on success and -1 on failure (and raises an
error). The first variable is the "exporting" object. The second
argument is the address to a bufferinfo structure. If view is NULL,
then no information is returned but a lock on the memory is still
obtained. In this case, the corresponding releasebuffer should also
be called with NULL.
The third argument indicates what kind of buffer the exporter is
allowed to return. It essentially tells the exporter what kind of
memory area the consumer can deal with. It also indicates what
members of the PyBuffer structure the consumer is going to care about.
The exporter can use this information to simplify how much of the PyBuffer
structure is filled in and/or raise an error if the object can't support
a simpler view of its memory.
Thus, the caller can request a simple "view" and either receive it or
have an error raised if it is not possible.
All of the following assume that at least buf, len, and readonly
will always be utilized by the caller.
Py_BUF_SIMPLE
The returned buffer will be assumed to be readable (the object may
or may not have writeable memory). Only the buf, len, and readonly
variables may be accessed. The format will be assumed to be
unsigned bytes . This is a "stand-alone" flag constant. It never
needs to be \|'d to the others. The exporter will raise an
error if it cannot provide such a contiguous buffer.
Py_BUF_WRITEABLE
The returned buffer must be writeable. If it is not writeable,
then raise an error.
Py_BUF_READONLY
The returned buffer must be readonly. If the object is already
read-only or it can make its memory read-only (and there are no
other views on the object) then it should do so and return the
buffer information. If the object does not have read-only memory
(or cannot make it read-only), then an error should be raised.
Py_BUF_FORMAT
The returned buffer must have true format information. This would
be used when the consumer is going to be checking for what 'kind'
of data is actually stored. An exporter should always be able
to provide this information if requested.
Py_BUF_SHAPE
The returned buffer must have shape information. The memory will
be assumed C-style contiguous (last dimension varies the fastest).
The exporter may raise an error if it cannot provide this kind
of contiguous buffer.
Py_BUF_STRIDES (implies Py_BUF_SHAPE)
The returned buffer must have strides information. This would be
used when the consumer can handle strided, discontiguous arrays.
Handling strides automatically assumes you can handle shape.
The exporter may raise an error if cannot provide a strided-only
representation of the data (i.e. without the suboffsets).
Py_BUF_OFFSETS (implies Py_BUF_STRIDES)
The returned buffer must have suboffsets information. This would
be used when the consumer can handle indirect array referencing
implied by these suboffsets.
Py_BUF_FULL (Py_BUF_OFFSETS | Py_BUF_WRITEABLE | Py_BUF_FORMAT)
Thus, the consumer simply wanting a contiguous chunk of bytes from
the object would use Py_BUF_SIMPLE, while a consumer that understands
how to make use of the most complicated cases could use Py_BUF_INDIRECT.
If format information is going to be probed, then Py_BUF_FORMAT must
be \|'d to the flags otherwise the consumer assumes it is unsigned
bytes.
There is a C-API that simple exporting objects can use to fill-in the
buffer info structure correctly according to the provided flags if a
contiguous chunk of "unsigned bytes" is all that can be exported.
The bufferinfo structure is::
struct bufferinfo {
void *buf;
Py_ssize_t len;
int readonly;
const char *format;
int ndims;
Py_ssize_t *shape;
Py_ssize_t *strides;
Py_ssize_t *suboffsets;
int itemsize;
void *internal;
} PyBuffer;
Before calling this function, the bufferinfo structure can be filled
with whatever. Upon return from getbufferproc, the bufferinfo
structure is filled in with relevant information about the buffer.
This same bufferinfo structure must be passed to bf_releasebuffer (if
available) when the consumer is done with the memory. The caller is
responsible for keeping a reference to obj until releasebuffer is
called (i.e. this call does not alter the reference count of obj).
The members of the bufferinfo structure are:
buf
a pointer to the start of the memory for the object
len
the total bytes of memory the object uses. This should be the
same as the product of the shape array multiplied by the number of
bytes per item of memory.
readonly
an integer variable to hold whether or not the memory is
readonly. 1 means the memory is readonly, zero means the
memory is writeable.
format
a NULL-terminated format-string (following the struct-style syntax
including extensions) indicating what is in each element of
memory. The number of elements is len / itemsize, where itemsize
is the number of bytes implied by the format. For standard
unsigned bytes use a format string of "B".
ndims
a variable storing the number of dimensions the memory represents.
Must be >=0.
shape
an array of ``Py_ssize_t`` of length ``ndims`` indicating the
shape of the memory as an N-D array. Note that ``((*shape)[0] *
... * (*shape)[ndims-1])*itemsize = len``. If ndims is 0 (indicating
a scalar), then this must be NULL.
strides
address of a ``Py_ssize_t*`` variable that will be filled with a
pointer to an array of ``Py_ssize_t`` of length ``ndims`` (or NULL
if ndims is 0). indicating the number of bytes to skip to get to
the next element in each dimension. If this is not requested by
the caller (BUF_STRIDES is not set), then this member of the
structure will not be used and the consumer is assuming the array
is C-style contiguous. If this is not the case, then an error
should be raised. If this member is requested by the caller
(BUF_STRIDES is set), then it must be filled in.
suboffsets
address of a ``Py_ssize_t *`` variable that will be filled with a
pointer to an array of ``Py_ssize_t`` of length ``*ndims``. If
these suboffset numbers are >=0, then the value stored along the
indicated dimension is a pointer and the suboffset value dictates
how many bytes to add to the pointer after de-referencing. A
suboffset value that it negative indicates that no de-referencing
should occur (striding in a contiguous memory block). If all
suboffsets are negative (i.e. no de-referencing is needed, then
this must be NULL.
For clarity, here is a function that returns a pointer to the
element in an N-D array pointed to by an N-dimesional index when
there are both strides and suboffsets.::
void* get_item_pointer(int ndim, void* buf, Py_ssize_t* strides,
Py_ssize_t* suboffsets, Py_ssize_t *indices) {
char* pointer = (char*)buf;
int i;
for (i = 0; i < ndim; i++) {
pointer += strides[i]*indices[i];
if (suboffsets[i] >=0 ) {
pointer = *((char**)pointer) + suboffsets[i];
}
}
return (void*)pointer;
}
Notice the suboffset is added "after" the dereferencing occurs.
Thus slicing in the ith dimension would add to the suboffsets in
the (i-1)st dimension. Slicing in the first dimension would change
the location of the starting pointer directly (i.e. buf would
be modified).
itemsize
This is a storage for the itemsize of each element of the shared
memory. It can be obtained using PyBuffer_SizeFromFormat but an
exporter may know it without making this call and thus storing it
is more convenient and faster.
internal
This is for use internally by the exporting object. For example,
this might be re-cast as an integer by the exporter and used to
store flags about whether or not the shape, strides, and suboffsets
arrays must be freed when the buffer is released. The consumer
should never touch this value.
The exporter is responsible for making sure the memory pointed to by
buf, format, shape, strides, and suboffsets is valid until
releasebuffer is called. If the exporter wants to be able to change
shape, strides, and/or suboffsets before releasebuffer is called then
it should allocate those arrays when getbuffer is called (pointing to
them in the buffer-info structure provided) and free them when
releasebuffer is called.
The same bufferinfo struct should be used in the release-buffer
interface call. The caller is responsible for the memory of the
bufferinfo structure itself.
``typedef int (*releasebufferproc)(PyObject *obj, PyBuffer *view)``
Callers of getbufferproc must make sure that this function is
called when memory previously acquired from the object is no
longer needed. The exporter of the interface must make sure that
any memory pointed to in the bufferinfo structure remains valid
until releasebuffer is called.
Both of these routines are optional for a type object
If the releasebuffer function is not provided then it does not ever
need to be called.
Exporters will need to define a releasebuffer function if they can
re-allocate their memory, strides, shape, suboffsets, or format
variables which they might share through the struct bufferinfo.
Several mechanisms could be used to keep track of how many getbuffer
calls have been made and shared. Either a single variable could be
used to keep track of how many "views" have been exported, or a
linked-list of bufferinfo structures filled in could be maintained in
each object.
All that is specifically required by the exporter, however, is to
ensure that any memory shared through the bufferinfo structure remains
valid until releasebuffer is called on the bufferinfo structure.
New C-API calls are proposed
============================
::
int PyObject_CheckBuffer(PyObject *obj)
Return 1 if the getbuffer function is available otherwise 0.
::
int PyObject_GetBuffer(PyObject *obj, PyBuffer *view,
int flags)
This is a C-API version of the getbuffer function call. It checks to
make sure object has the required function pointer and issues the
call. Returns -1 and raises an error on failure and returns 0 on
success.
::
int PyObject_ReleaseBuffer(PyObject *obj, PyBuffer *view)
This is a C-API version of the releasebuffer function call. It checks
to make sure the object has the required function pointer and issues
the call. Returns 0 on success and -1 (with an error raised) on
failure. This function always succeeds if there is no releasebuffer
function for the object.
::
PyObject *PyObject_GetMemoryView(PyObject *obj)
Return a memory-view object from an object that defines the buffer interface.
A memory-view object is an extended buffer object that could replace
the buffer object (but doesn't have to). It's C-structure is
::
typedef struct {
PyObject_HEAD
PyObject *base;
int ndims;
Py_ssize_t *starts; /* slice starts */
Py_ssize_t *stops; /* slice stops */
Py_ssize_t *steps; /* slice steps */
} PyMemoryViewObject;
This is functionally similar to the current buffer object except only
a reference to base is kept. The actual memory for base must be
re-grabbed using the buffer-protocol, whenever it is needed.
The getbuffer and releasebuffer for this object use the underlying
base object (adjusted using the slice information). If the number of
dimensions of the base object (or the strides or the size) has changed
when a new view is requested, then the getbuffer will trigger an error.
This memory-view object will support mult-dimensional slicing. Slices
of the memory-view object are other memory-view objects. When an
"element" from the memory-view is returned it is always a tuple of
bytes object + format string which can then be interpreted using the
struct module if desired.
::
int PyBuffer_SizeFromFormat(const char *)
Return the implied itemsize of the data-format area from a struct-style
description.
::
int PyObject_GetContiguous(PyObject *obj, void **buf, Py_ssize_t *len,
char **format, char fortran)
Return a contiguous chunk of memory representing the buffer. If a
copy is made then return 1. If no copy was needed return 0. If an
error occurred in probing the buffer interface, then return -1. The
contiguous chunk of memory is pointed to by ``*buf`` and the length of
that memory is ``*len``. If the object is multi-dimensional, then if
fortran is 'F', the first dimension of the underlying array will vary
the fastest in the buffer. If fortran is 'C', then the last dimension
will vary the fastest (C-style contiguous). If fortran is 'A', then it
does not matter and you will get whatever the object decides is more
efficient.
::
int PyObject_CopyToObject(PyObject *obj, void *buf, Py_ssize_t len,
char fortran)
Copy ``len`` bytes of data pointed to by the contiguous chunk of
memory pointed to by ``buf`` into the buffer exported by obj. Return
0 on success and return -1 and raise an error on failure. If the
object does not have a writeable buffer, then an error is raised. If
fortran is 'F', then if the object is multi-dimensional, then the data
will be copied into the array in Fortran-style (first dimension varies
the fastest). If fortran is 'C', then the data will be copied into the
array in C-style (last dimension varies the fastest). If fortran is 'A', then
it does not matter and the copy will be made in whatever way is more
efficient.
::
void PyBuffer_FreeMem(void *buf)
This function frees the memory returned by PyObject_GetContiguous if a
copy was made. Do not call this function unless
PyObject_GetContiguous returns a 1 indicating that new memory was
created.
These last three C-API calls allow a standard way of getting data in and
out of Python objects into contiguous memory areas no matter how it is
actually stored. These calls use the extended buffer interface to perform
their work.
::
int PyBuffer_IsContiguous(PyBuffer *view, char fortran);
Return 1 if the memory defined by the view object is C-style (fortran = 'C')
or Fortran-style (fortran = 'A') contiguous. Return 0 otherwise.
::
void PyBuffer_FillContiguousStrides(int *ndims, Py_ssize_t *shape,
int itemsize,
Py_ssize_t *strides, char fortran)
Fill the strides array with byte-strides of a contiguous (C-style if
fortran is 0 or Fortran-style if fortran is 1) array of the given
shape with the given number of bytes per element.
::
int PyBuffer_FillInfo(PyBuffer *view, void *buf,
Py_ssize_t len, int readonly, int infoflags)
Fills in a buffer-info structure correctly for an exporter that can
only share a contiguous chunk of memory of "unsigned bytes" of the
given length. Returns 0 on success and -1 (with raising an error) on
error.
Additions to the struct string-syntax
=====================================
The struct string-syntax is missing some characters to fully
implement data-format descriptions already available elsewhere (in
ctypes and NumPy for example). The Python 2.5 specification is
at http://docs.python.org/lib/module-struct.html
Here are the proposed additions:
================ ===========
Character Description
================ ===========
't' bit (number before states how many bits)
'?' platform _Bool type
'g' long double
'c' ucs-1 (latin-1) encoding
'u' ucs-2
'w' ucs-4
'O' pointer to Python Object
'Z' complex (whatever the next specifier is)
'&' specific pointer (prefix before another charater)
'T{}' structure (detailed layout inside {})
'(k1,k2,...,kn)' multi-dimensional array of whatever follows
':name:' optional name of the preceeding element
'X{}' pointer to a function (optional function
signature inside {})
' \n\t' ignored (allow better readability)
-- this may already be true
================ ===========
The struct module will be changed to understand these as well and
return appropriate Python objects on unpacking. Unpacking a
long-double will return a decimal object or a ctypes long-double.
Unpacking 'u' or 'w' will return Python unicode. Unpacking a
multi-dimensional array will return a list (of lists if >1d).
Unpacking a pointer will return a ctypes pointer object. Unpacking a
function pointer will return a ctypes call-object (perhaps). Unpacking
a bit will return a Python Bool. White-space in the struct-string
syntax will be ignored if it isn't already. Unpacking a named-object
will return some kind of named-tuple-like object that acts like a
tuple but whose entries can also be accessed by name. Unpacking a
nested structure will return a nested tuple.
Endian-specification ('!', '@','=','>','<', '^') is also allowed
inside the string so that it can change if needed. The
previously-specified endian string is in force until changed. The
default endian is '@' which means native data-types and alignment. If
un-aligned, native data-types are requested, then the endian
specification is '^'.
According to the struct-module, a number can preceed a character
code to specify how many of that type there are. The
(k1,k2,...,kn) extension also allows specifying if the data is
supposed to be viewed as a (C-style contiguous, last-dimension
varies the fastest) multi-dimensional array of a particular format.
Functions should be added to ctypes to create a ctypes object from
a struct description, and add long-double, and ucs-2 to ctypes.
Examples of Data-Format Descriptions
====================================
Here are some examples of C-structures and how they would be
represented using the struct-style syntax.
<named> is the constructor for a named-tuple (not-specified yet).
float
'f' <--> Python float
complex double
'Zd' <--> Python complex
RGB Pixel data
'BBB' <--> (int, int, int)
'B:r: B:g: B:b:' <--> <named>((int, int, int), ('r','g','b'))
Mixed endian (weird but possible)
'>i:big: <i:little:' <--> <named>((int, int), ('big', 'little'))
Nested structure
::
struct {
int ival;
struct {
unsigned short sval;
unsigned char bval;
unsigned char cval;
} sub;
}
"""i:ival:
T{
H:sval:
B:bval:
B:cval:
}:sub:
"""
Nested array
::
struct {
int ival;
double data[16*4];
}
"""i:ival:
(16,4)d:data:
"""
Code to be affected
===================
All objects and modules in Python that export or consume the old
buffer interface will be modified. Here is a partial list.
* buffer object
* bytes object
* string object
* array module
* struct module
* mmap module
* ctypes module
Anything else using the buffer API.
Issues and Details
==================
It is intended that this PEP will be back-ported to Python 2.6 by
adding the C-API and the two functions to the existing buffer
protocol.
The proposed locking mechanism relies entirely on the exporter object
to not invalidate any of the memory pointed to by the buffer structure
until a corresponding releasebuffer is called. If it wants to be able
to change its own shape and/or strides arrays, then it needs to create
memory for these in the bufferinfo structure and copy information
over.
The sharing of strided memory and suboffsets is new and can be seen as
a modification of the multiple-segment interface. It is motivated by
NumPy and the PIL. NumPy objects should be able to share their
strided memory with code that understands how to manage strided memory
because strided memory is very common when interfacing with compute
libraries.
Also, with this approach it should be possible to write generic code
that works with both kinds of memory.
Memory management of the format string, the shape array, the strides
array, and the suboffsets array in the bufferinfo structure is always
the responsibility of the exporting object. The consumer should not
set these pointers to any other memory or try to free them.
Several ideas were discussed and rejected:
Having a "releaser" object whose release-buffer was called. This
was deemed unacceptable because it caused the protocol to be
asymmetric (you called release on something different than you
"got" the buffer from). It also complicated the protocol without
providing a real benefit.
Passing all the struct variables separately into the function.
This had the advantage that it allowed one to set NULL to
variables that were not of interest, but it also made the function
call more difficult. The flags variable allows the same
ability of consumers to be "simple" in how they call the protocol.
Code
========
The authors of the PEP promise to contribute and maintain the code for
this proposal but will welcome any help.
Examples
=========
Ex. 1
-----------
This example shows how an image object that uses contiguous lines might expose its buffer.
::
struct rgba {
unsigned char r, g, b, a;
};
struct ImageObject {
PyObject_HEAD;
...
struct rgba** lines;
Py_ssize_t height;
Py_ssize_t width;
Py_ssize_t shape_array[2];
Py_ssize_t stride_array[2];
Py_ssize_t view_count;
};
"lines" points to malloced 1-D array of (struct rgba*). Each pointer
in THAT block points to a seperately malloced array of (struct rgba).
In order to access, say, the red value of the pixel at x=30, y=50, you'd use "lines[50][30].r".
So what does ImageObject's getbuffer do? Leaving error checking out::
int Image_getbuffer(PyObject *self, PyBuffer *view, int flags) {
static Py_ssize_t suboffsets[2] = { -1, 0 };
view->buf = self->lines;
view->len = self->height*self->width;
view->readonly = 0;
view->ndims = 2;
self->shape_array[0] = height;
self->shape_array[1] = width;
view->shape = &self->shape_array;
self->stride_array[0] = sizeof(struct rgba*);
self->stride_array[1] = sizeof(struct rgba);
view->strides = &self->stride_array;
view->suboffsets = suboffsets;
self->view_count ++;
return 0;
}
int Image_releasebuffer(PyObject *self, PyBuffer *view) {
self->view_count--;
return 0;
}
Ex. 2
-----------
This example shows how an object that wants to expose a contiguous
chunk of memory (which will never be re-allocated while the object is
alive) would do that.
::
int myobject_getbuffer(PyObject *self, PyBuffer *view, int flags) {
void *buf;
Py_ssize_t len;
int readonly=0;
buf = /* Point to buffer */
len = /* Set to size of buffer */
readonly = /* Set to 1 if readonly */
return PyObject_FillBufferInfo(view, buf, len, readonly, flags);
}
No releasebuffer is necessary because the memory will never
be re-allocated so the locking mechanism is not needed.
Ex. 3
-----------
A consumer that wants to only get a simple contiguous chunk of bytes
from a Python object, obj would do the following:
::
PyBuffer view;
int ret;
if (PyObject_GetBuffer(obj, &view, Py_BUF_SIMPLE) < 0) {
/* error return */
}
/* Now, view.buf is the pointer to memory
view.len is the length
view.readonly is whether or not the memory is read-only.
*/
/* After using the information and you don't need it anymore */
if (PyObject_ReleaseBuffer(obj, &view) < 0) {
/* error return */
}
Ex. 4
-----------
A consumer that wants to be able to use any object's memory but is
writing an algorithm that only handle contiguous memory could do the following:
::
void *buf;
Py_ssize_t len;
char *format;
if (PyObject_GetContiguous(obj, &buf, &len, &format, 0) < 0) {
/* error return */
}
/* process memory pointed to by buffer if format is correct */
/* Optional:
if, after processing, we want to copy data from buffer back
into the the object
we could do
*/
if (PyObject_CopyToObject(obj, buf, len, 0) < 0) {
/* error return */
}
Copyright
=========
This PEP is placed in the public domain
|