File: user_guide.rst

package info (click to toggle)
numexpr 2.14.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 784 kB
  • sloc: cpp: 4,250; python: 3,985; ansic: 369; makefile: 203
file content (340 lines) | stat: -rw-r--r-- 14,608 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
NumExpr User Guide
==================

The NumExpr package supplies routines for the fast evaluation of
array expressions elementwise by using a vector-based virtual
machine.

Using it is simple::

    >>> import numpy as np
    >>> import numexpr as ne
    >>> a = np.arange(10)
    >>> b = np.arange(0, 20, 2)
    >>> c = ne.evaluate('2*a + 3*b')
    >>> c
    array([ 0,  8, 16, 24, 32, 40, 48, 56, 64, 72])


It is also possible to use NumExpr to validate an expression::

    >>> ne.validate('2*a + 3*b')

which returns `None` on success or raises an exception on invalid inputs.

and it can also re_evaluate an expression::

    >>> b = np.arange(0, 40, 4)
    >>> ne.re_evaluate()

Building
--------

*NumExpr* requires Python_ 3.7 or greater, and NumPy_ 1.13 or greater.  It is
built in the standard Python way:

.. code-block:: bash

    $ pip install .

You must have a C-compiler (i.e. MSVC Build tools on Windows and GCC on Linux) installed.

Then change to a directory that is not the repository directory (e.g. `/tmp`) and
test :code:`numexpr` with:

.. code-block:: bash

    $ python -c "import numexpr; numexpr.test()"

.. _Python: http://python.org
.. _NumPy: http://numpy.scipy.org


Enabling Intel VML support
--------------------------

Starting from release 1.2 on, numexpr includes support for Intel's VML
library.  This allows for better performance on Intel architectures,
mainly when evaluating transcendental functions (trigonometrical,
exponential, ...). It also enables numexpr using several CPU cores.

If you have Intel's MKL (the library that embeds VML), just copy the
:code:`site.cfg.example` that comes in the distribution to :code:`site.cfg` and
edit the latter giving proper directions on how to find your MKL
libraries in your system.  After doing this, you can proceed with the
usual building instructions listed above.  Pay attention to the
messages during the building process in order to know whether MKL has
been detected or not.  Finally, you can check the speed-ups on your
machine by running the :code:`bench/vml_timing.py` script (you can play with
different parameters to the :code:`set_vml_accuracy_mode()` and
:code:`set_vml_num_threads()` functions in the script so as to see how it would
affect performance).

Threadpool Configuration
------------------------

Threads are spawned at import-time, with the number being set by the environment
variable ``NUMEXPR_MAX_THREADS``. The default maximum thread count is **64**.
There is no advantage to spawning more threads than the number of virtual cores
available on the computing node. Practically NumExpr scales at large thread
count (`> 8`) only on very large matrices (`> 2**22`). Spawning large numbers
of threads is not free, and can increase import times for NumExpr or packages
that import it such as Pandas or PyTables.

If desired, the number of threads in the pool used can be adjusted via an
environment variable, ``NUMEXPR_NUM_THREADS`` (preferred) or ``OMP_NUM_THREADS``.
Typically only setting ``NUMEXPR_MAX_THREADS`` is sufficient; the number of
threads used can be adjusted dynamically via ``numexpr.set_num_threads(int)``.
The number of threads can never exceed that set by ``NUMEXPR_MAX_THREADS``.

If the user has not configured the environment prior to importing NumExpr, info
logs will be generated, and the initial number of threads *that are used*_ will
be set to the number of cores detected in the system or 8, whichever is *less*.

Usage::

    import os
    os.environ['NUMEXPR_MAX_THREADS'] = '16'
    os.environ['NUMEXPR_NUM_THREADS'] = '8'
    import numexpr as ne

Usage Notes
-----------

`NumExpr`'s principal routine is::

    evaluate(ex, local_dict=None, global_dict=None, optimization='aggressive', truediv='auto')

where :code:`ex` is a string forming an expression, like :code:`"2*a+3*b"`.  The
values for :code:`a` and :code:`b` will by default be taken from the calling
function's frame (through the use of :code:`sys._getframe()`).
Alternatively, they can be specified using the :code:`local_dict` or
:code:`global_dict` arguments, or passed as keyword arguments.

The :code:`optimization` parameter can take the values :code:`'moderate'`
or :code:`'aggressive'`.  :code:`'moderate'` means that no optimization is made
that can affect precision at all.  :code:`'aggressive'` (the default) means that
the expression can be rewritten in a way that precision *could* be affected, but
normally very little.  For example, in :code:`'aggressive'` mode, the
transformation :code:`x~**3` -> :code:`x*x*x` is made, but not in
:code:`'moderate'` mode.

The `truediv` parameter specifies whether the division is a 'floor division'
(False) or a 'true division' (True).  The default is the value of
`__future__.division` in the interpreter.  See PEP 238 for details.

Expressions are cached, so reuse is fast.  Arrays or scalars are
allowed for the variables, which must be of type 8-bit boolean (bool),
32-bit signed integer (int), 64-bit signed integer (long),
double-precision floating point number (float), 2x64-bit,
double-precision complex number (complex) or raw string of bytes
(str).  If they are not in the previous set of types, they will be
properly upcasted for internal use (the result will be affected as
well).  The arrays must all be the same size.


Datatypes supported internally
------------------------------

*NumExpr* operates internally only with the following types:

    * 8-bit boolean (bool)
    * 32-bit signed integer (int or int32)
    * 64-bit signed integer (long or int64)
    * 32-bit single-precision floating point number (float or float32)
    * 64-bit, double-precision floating point number (double or float64)
    * 2x64-bit, double-precision complex number (complex or complex128)
    * Raw string of bytes (str in Python 2.7, bytes in Python 3+, numpy.str in both cases)

If the arrays in the expression does not match any of these types,
they will be upcasted to one of the above types (following the usual
type inference rules, see below).  Have this in mind when doing
estimations about the memory consumption during the computation of
your expressions.

Also, the types in NumExpr conditions are somewhat stricter than those
of Python.  For instance, the only valid constants for booleans are
:code:`True` and :code:`False`, and they are never automatically cast to integers.


Casting rules
-------------

Casting rules in NumExpr follow closely those of *NumPy*.  However, for
implementation reasons, there are some known exceptions to this rule,
namely:

    * When an array with type :code:`int8`, :code:`uint8`, :code:`int16` or
      :code:`uint16` is used inside NumExpr, it is internally upcasted to an
      :code:`int` (or :code:`int32` in NumPy notation).
    * When an array with type :code:`uint32` is used inside NumExpr, it is
      internally upcasted to a :code:`long` (or :code:`int64` in NumPy notation).
    * A floating point function (e.g. :code:`sin`) acting on :code:`int8` or
      :code:`int16` types returns a :code:`float64` type, instead of the
      :code:`float32` that is returned by NumPy functions.  This is mainly due
      to the absence of native :code:`int8` or :code:`int16` types in NumExpr.
    * In operations implying a scalar and an array, the normal rules of casting
      are used in NumExpr, in contrast with NumPy, where array types takes
      priority.  For example, if :code:`a` is an array of type :code:`float32`
      and :code:`b` is an scalar of type :code:`float64` (or Python :code:`float`
      type, which is equivalent), then :code:`a*b` returns a :code:`float64` in
      NumExpr, but a :code:`float32` in NumPy (i.e. array operands take priority
      in determining the result type).  If you need to keep the result a
      :code:`float32`, be sure you use a :code:`float32` scalar too.


Supported operators
-------------------

*NumExpr* supports the set of operators listed below:

    * Bitwise and logical operators (and, or, not, xor): :code:`&, |, ~, ^`
    * Comparison operators: :code:`<, <=, ==, !=, >=, >`
    * Unary arithmetic operators: :code:`-`
    * Binary arithmetic operators: :code:`+, -, *, /, //, **, %, <<, >>`


Supported functions
-------------------

The next are the current supported set:

    * :code:`where(bool, number1, number2): number` -- number1 if the bool condition
      is true, number2 otherwise.
    * :code:`{isinf, isnan, isfinite}(float|complex): bool` -- returns element-wise True
      for ``inf`` or ``NaN``, ``NaN``, not ``inf`` respectively.
    * :code:`signbit(float|complex): bool` -- returns element-wise True if signbit is set
      False otherwise.
    * :code:`{sin,cos,tan}(float|complex): float|complex` -- trigonometric sine,
      cosine or tangent.
    * :code:`{arcsin,arccos,arctan}(float|complex): float|complex` -- trigonometric
      inverse sine, cosine or tangent.
    * :code:`arctan2(float1, float2): float` -- trigonometric inverse tangent of
      float1/float2.
    * :code:`hypot(float1, float2): float` -- Euclidean distance between float1, float2
    * :code:`nextafter(float1, float2): float` -- next representable floating-point value after
      float1 in direction of float2
    * :code:`copysign(float1, float2): float` -- return number with magnitude of float1 and
      sign of float2
    * :code:`{maximum,minimum}(float1, float2): float` -- return max/min of float1, float2
    * :code:`{sinh,cosh,tanh}(float|complex): float|complex` -- hyperbolic sine,
      cosine or tangent.
    * :code:`{arcsinh,arccosh,arctanh}(float|complex): float|complex` -- hyperbolic
      inverse sine, cosine or tangent.
    * :code:`{log,log10,log1p,log2}(float|complex): float|complex` -- natural, base-10 and
      log(1+x) logarithms.
    * :code:`{exp,expm1}(float|complex): float|complex` -- exponential and exponential
      minus one.
    * :code:`sqrt(float|complex): float|complex` -- square root.
    * :code:`trunc(float): float` -- round towards zero
    * :code:`round(float|complex|int): float|complex|int` -- round to nearest integer (`rint`)
    * :code:`sign(float|complex|int): float|complex|int` -- return -1, 0, +1 depending on sign
    * :code:`abs(float|complex|int): float|complex|int`  -- absolute value.
    * :code:`conj(complex): complex` -- conjugate value.
    * :code:`{real,imag}(complex): float` -- real or imaginary part of complex.
    * :code:`complex(float, float): complex` -- complex from real and imaginary
      parts.
    * :code:`contains(np.str, np.str): bool` -- returns True for every string in :code:`op1` that
      contains :code:`op2`.

Notes
-----

    * :code:`abs()` for complex inputs returns a :code:`complex` output too.  This is a
      departure from NumPy where a :code:`float` is returned instead.  However,
      NumExpr is not flexible enough yet so as to allow this to happen.
      Meanwhile, if you want to mimic NumPy behaviour, you may want to select the
      real part via the :code:`real` function (e.g. :code:`real(abs(cplx))`) or via the
      :code:`real` selector (e.g. :code:`abs(cplx).real`).

More functions can be added if you need them. Note however that NumExpr 2.6 is
in maintenance mode and a new major revision is under development.

Supported reduction operations
------------------------------

The next are the current supported set:

  * :code:`sum(number, axis=None)`: Sum of array elements over a given axis.
    Negative axis are not supported.
  * :code:`prod(number, axis=None)`: Product of array elements over a given axis.
    Negative axis are not supported.

*Note:* because of internal limitations, reduction operations must appear the
last in the stack.  If not, it will be issued an error like::

    >>> ne.evaluate('sum(1)*(-1)')
    RuntimeError: invalid program: reduction operations must occur last

General routines
----------------

  * :code:`evaluate(expression, local_dict=None, global_dict=None,
    optimization='aggressive', truediv='auto')`:  Evaluate a simple array
    expression element-wise.  See examples above.
  * :code:`re_evaluate(local_dict=None)`:  Re-evaluate the last array expression
    without any check.  This is meant for accelerating loops that are re-evaluating
    the same expression repeatedly without changing anything else than the operands.
    If unsure, use evaluate() which is safer.
  * :code:`test()`:  Run all the tests in the test suite.
  * :code:`print_versions()`:  Print the versions of software that numexpr relies on.
  * :code:`set_num_threads(nthreads)`: Sets a number of threads to be used in operations.
    Returns the previous setting for the number of threads.  See note below to see
    how the number of threads is set via environment variables.

    If you are using VML, you may want to use *set_vml_num_threads(nthreads)* to
    perform the parallel job with VML instead.  However, you should get very
    similar performance with VML-optimized functions, and VML's parallelizer
    cannot deal with common expressions like `(x+1)*(x-2)`, while NumExpr's
    one can.

  * :code:`detect_number_of_cores()`: Detects the number of cores on a system.


Intel's VML specific support routines
-------------------------------------

When compiled with Intel's VML (Vector Math Library), you will be able
to use some additional functions for controlling its use. These are:

  * :code:`set_vml_accuracy_mode(mode)`:  Set the accuracy for VML operations.

    The :code:`mode` parameter can take the values:

    - :code:`'low'`: Equivalent to VML_LA - low accuracy VML functions are called
    - :code:`'high'`: Equivalent to VML_HA - high accuracy VML functions are called
    - :code:`'fast'`: Equivalent to VML_EP - enhanced performance VML functions are called

    It returns the previous mode.

    This call is equivalent to the :code:`vmlSetMode()` in the VML library. See:

    http://www.intel.com/software/products/mkl/docs/webhelp/vml/vml_DataTypesAccuracyModes.html

    for more info on the accuracy modes.

  * :code:`set_vml_num_threads(nthreads)`: Suggests a maximum number of
    threads to be used in VML operations.

    This function is equivalent to the call
    :code:`mkl_domain_set_num_threads(nthreads, MKL_VML)` in the MKL library.
    See:

    http://www.intel.com/software/products/mkl/docs/webhelp/support/functn_mkl_domain_set_num_threads.html

    for more info about it.

  * :code:`get_vml_version()`:  Get the VML/MKL library version.


Authors
-------

.. include:: ../AUTHORS.txt

License
-------

NumExpr is distributed under the MIT_ license.

.. _MIT: http://www.opensource.org/licenses/mit-license.php