1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
|
Announcement for numarray
We have been working on a reimplementation of Numeric, the numeric
array manipulation extension module for Python. The reimplementation
is virtually a complete rewrite.
While we believe that performance is generally very good for arrays
larger then 100K elements, there are functions and aspects of numarray
which have not been optimized yet, so please be careful about drawing
conclusions about its efficiency from limited test cases. We also
welcome those that would like to contribute to this effort by helping
with the development, or adding libraries.
This new version was developed for number of reasons. To summarize,
we regularly deal with large datasets and the new version gives us the
capabilities that we feel are necessary for working with such
datasets. In particular:
1) Avoiding promotion of array types in expressions involving Python
scalars (e.g., 2.*<Float32 array> should not result in a Float64
array).
2) Ability to use memory mapped files [largely implemented].
3) Ability to access data in fields of arrays of records as numeric
arrays without copying the data to a new array.
4) Ability to reference byteswapped data or non-aligned data (as might
be found in record arrays) without producing new temporary arrays.
5) Reuse temporary arrays in expressions when possible [not
implemented yet]
6) Provide more convenient use of index arrays (put and take).
A new implementation was decided upon since many of the existing numpy
developers agree that the existing implementation is not suitable for
massive changes and enhancements. Furthermore Guido is very reluctant
to accept Numeric into the Standard Library unless it was rewritten.
This version has nearly the fully functionality of the basic Numeric
(only masked arrays and the convolve function are missing, and the use
of the ufunc methods reduce, accumulate, and outer for complex
types). No other libraries have been adapted to to Numarray yet.
NUMARRAY IS NOT FULLY COMPATIBLE WITH NUMERIC. (But is very similar in
most respects). The incompatibilities will be listed below.
Major Compatibility issues:
1) Coercion rules are different. Expressions involving
scalars may not produce the same type of arrays.
2) Types are represented by Type Objects rather than character
codes (though the old character codes may still be used
as arguments to the functions).
3) The C interface implements a subset of the Numeric API to provide
source level compatibility for Numeric extensions. This means that
Numeric extensions need to be ported to and recompiled for
numarray. Numeric C-extension binaries are not directly useable by
numarray, but Numeric source code typically is easily ported to
numarray.
OPEN ISSUES:
There are still some open questions about the interface which we hope
to resolve with feedback from the Numeric commmunity. These issues are
listed later. Depending on what the consensus is, we may change
aspects of the user interface. It is not our intent to make the case
for either side. This should be done by the proponents of each view in
more detail on the numpy mailing list.
1) Currently a number of functions or methods (e.g., reduce) act on
the first dimension rather than the last by default (as does
Numeric). Some feel that it makes sense to always apply functions
to the most rapidly varying dimension by default and they are
proposing that we change numarray to reflect this. Others say that
the current behavior is more compatible with Python behavior, and
that which dimension a function applies to depends on the function
(reduce would apply to the first while FFT would apply to the
last).
2) The ones and zeros functions should return Float64 arrays
by default (currently Int32)
3) What is the necessary degree of optimization for small arrays? The
current implementation has much of the code in Python. The
performance on large arrays is generally fast (or easy to
optimize), but reducing the overhead for setting up array
operations will require much more of the current implementation be
rewritten in C or more complex algorithms in Python. Either
approach involves significant work and a more difficult maintain
module. We have not yet done any significant benchmarking lately,
but we do have some order of magnitude estimates about relative
performance. (The following numbers depend on the platform so do
not take them too seriously). IDL is already at about half its
asymptotic throughput for 200 element arrays. Numpy is at its half
throughput for 2000 element arrays. We estimate that numarray is
probably another order of magnitude worse, i.e., that 20K element
arrays are at half the asymptotic speed. How much should this be
improved? We have some thoughts about other approaches that don't
require this sort of optimization, but rather try casting iterative
calculations on small arrays into a form that numarray can perform
in one call. These ideas will be elaborated on in a separate
message.
4) What types are of highest importance to add: Float128, Complex256.
New capabilities:
o Record arrays: arrays of records that have fixed format fields.
o Character arrays: arrays of fixed-length character strings.
o Support for memory mapped files.
o Use of index arrays within subscripts: e.g.
if ind = array([4, 4, 0, 2]) and x = 2*arange(6),
x[inx] results in array([8, 8, 0, 4])
o Support for byteswapped representation.
Things yet to be done:
o Add all the commonly available Numeric libraries to numarray.
o Benchmarking and optimization. For example, a some of the
functions could be reimplemented with much higher performance
(e.g., array printing).
We have developed a revised version of the Numeric manual for numarray
(all in all, there is a great deal of commonality).
|