File: alignment.rst

package info (click to toggle)
numpy 1%3A2.3.4%2Bds-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 86,072 kB
  • sloc: python: 255,800; asm: 232,483; ansic: 212,593; cpp: 157,465; f90: 1,575; sh: 845; fortran: 567; makefile: 431; sed: 139; xml: 109; java: 97; perl: 82; cs: 62; javascript: 53; objc: 33; lex: 13; yacc: 9
file content (113 lines) | stat: -rw-r--r-- 5,529 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
.. currentmodule:: numpy

.. _alignment:

****************
Memory alignment
****************

NumPy alignment goals
=====================

There are three use-cases related to memory alignment in NumPy (as of 1.14):

1. Creating :term:`structured datatypes <structured data type>` with
   :term:`fields <field>` aligned like in a C-struct.
2. Speeding up copy operations by using :class:`uint` assignment in instead of
   ``memcpy``.
3. Guaranteeing safe aligned access for ufuncs/setitem/casting code.

NumPy uses two different forms of alignment to achieve these goals:
"True alignment" and "Uint alignment".

"True" alignment refers to the architecture-dependent alignment of an
equivalent C-type in C. For example, in x64 systems :attr:`float64` is
equivalent to ``double`` in C. On most systems, this has either an alignment of
4 or 8 bytes (and this can be controlled in GCC by the option
``malign-double``).  A variable is aligned in memory if its memory offset is a
multiple of its alignment. On some systems (eg. sparc) memory alignment is
required; on others, it gives a speedup.

"Uint" alignment depends on the size of a datatype. It is defined to be the
"True alignment" of the uint used by NumPy's copy-code to copy the datatype, or
undefined/unaligned if there is no equivalent uint. Currently, NumPy uses
``uint8``, ``uint16``, ``uint32``, ``uint64``, and ``uint64`` to copy data of
size 1, 2, 4, 8, 16 bytes respectively, and all other sized datatypes cannot
be uint-aligned.

For example, on a (typical Linux x64 GCC) system, the NumPy :attr:`complex64`
datatype is implemented as ``struct { float real, imag; }``. This has "true"
alignment of 4 and "uint" alignment of 8 (equal to the true alignment of
``uint64``).

Some cases where uint and true alignment are different (default GCC Linux):
   ======   =========   ========    ========
   arch     type        true-aln    uint-aln
   ======   =========   ========    ========
   x86_64   complex64          4           8
   x86_64   float128          16           8
   x86      float96            4          \-
   ======   =========   ========    ========


Variables in NumPy which control and describe alignment
=======================================================

There are 4 relevant uses of the word ``align`` used in NumPy:

* The :attr:`dtype.alignment` attribute (``descr->alignment`` in C). This is
  meant to reflect the "true alignment" of the type. It has arch-dependent
  default values for all datatypes, except for the structured types created
  with ``align=True`` as described below.
* The ``ALIGNED`` flag of an ndarray, computed in ``IsAligned`` and checked
  by :c:func:`PyArray_ISALIGNED`. This is computed from
  :attr:`dtype.alignment`.
  It is set to ``True`` if every item in the array is at a memory location
  consistent with :attr:`dtype.alignment`, which is the case if the
  ``data ptr`` and all strides of the array are multiples of that alignment.
* The ``align`` keyword of the dtype constructor, which only affects
  :ref:`structured_arrays`. If the structure's field offsets are not manually
  provided, NumPy determines offsets automatically. In that case,
  ``align=True`` pads the structure so that each field is "true" aligned in
  memory and sets :attr:`dtype.alignment` to be the largest of the field
  "true" alignments. This is like what C-structs usually do. Otherwise if
  offsets or itemsize were manually provided ``align=True`` simply checks that
  all the fields are "true" aligned and that the total itemsize is a multiple
  of the largest field alignment. In either case :attr:`dtype.isalignedstruct`
  is also set to True.
* ``IsUintAligned`` is used to determine if an ndarray is "uint aligned" in
  an analogous way to how ``IsAligned`` checks for true alignment.

Consequences of alignment
=========================

Here is how the variables above are used:

1. Creating aligned structs: To know how to offset a field when
   ``align=True``, NumPy looks up ``field.dtype.alignment``. This includes
   fields that are nested structured arrays.
2. Ufuncs: If the ``ALIGNED`` flag of an array is False, ufuncs will
   buffer/cast the array before evaluation. This is needed since ufunc inner
   loops access raw elements directly, which might fail on some archs if the
   elements are not true-aligned.
3. Getitem/setitem/copyswap function: Similar to ufuncs, these functions
   generally have two code paths. If ``ALIGNED`` is False they will
   use a code path that buffers the arguments so they are true-aligned.
4. Strided copy code: Here, "uint alignment" is used instead.  If the itemsize
   of an array is equal to 1, 2, 4, 8 or 16 bytes and the array is uint
   aligned then instead NumPy will do ``*(uintN*)dst) = *(uintN*)src)`` for
   appropriate N. Otherwise, NumPy copies by doing ``memcpy(dst, src, N)``.
5. Nditer code: Since this often calls the strided copy code, it must
   check for "uint alignment".
6. Cast code: This checks for "true" alignment, as it does
   ``*dst = CASTFUNC(*src)`` if aligned. Otherwise, it does
   ``memmove(srcval, src); dstval = CASTFUNC(srcval); memmove(dst, dstval)``
   where dstval/srcval are aligned.

Note that the strided-copy and strided-cast code are deeply intertwined and so
any arrays being processed by them must be both uint and true aligned, even
though the copy-code only needs uint alignment and the cast code only true
alignment.  If there is ever a big rewrite of this code it would be good to
allow them to use different alignments.