File: indexing.py

package info (click to toggle)
python-numpy 1%3A1.4.1-5
links: PTS, VCS
area: main
in suites: squeeze
size: 12,736 kB
ctags: 14,035
sloc: ansic: 80,776; python: 62,482; makefile: 183; fortran: 121; f90: 42; cpp: 28; perl: 19; sh: 15
file content (384 lines) | stat: -rw-r--r-- 14,286 bytes
"""
==============
Array indexing
==============

Array indexing refers to any use of the square brackets ([]) to index
array values. There are many options to indexing, which give numpy
indexing great power, but with power comes some complexity and the
potential for confusion. This section is just an overview of the
various options and issues related to indexing. Aside from single
element indexing, the details on most of these options are to be
found in related sections.

Assignment vs referencing
=========================

Most of the following examples show the use of indexing when referencing
data in an array. The examples work just as well when assigning to an
array. See the section at the end for specific examples and explanations
on how assignments work.

Single element indexing
=======================

Single element indexing for a 1-D array is what one expects. It work
exactly like that for other standard Python sequences. It is 0-based,
and accepts negative indices for indexing from the end of the array. ::

    >>> x = np.arange(10)
    >>> x[2]
    2
    >>> x[-2]
    8

Unlike lists and tuples, numpy arrays support multidimensional indexing
for multidimensional arrays. That means that it is not necessary to
separate each dimension's index into its own set of square brackets. ::

    >>> x.shape = (2,5) # now x is 2-dimensional
    >>> x[1,3]
    8
    >>> x[1,-1]
    9

Note that if one indexes a multidimensional array with fewer indices
than dimensions, one gets a subdimensional array. For example: ::

    >>> x[0]
    array([0, 1, 2, 3, 4])

That is, each index specified selects the array corresponding to the rest
of the dimensions selected. In the above example, choosing 0 means that
remaining dimension of lenth 5 is being left unspecified, and that what
is returned is an array of that dimensionality and size. It must be noted
that the returned array is not a copy of the original, but points to the
same values in memory as does the original array (a new view of the same
data in other words, see xxx for details). In this case,
the 1-D array at the first position (0) is returned. So using a single
index on the returned array, results in a single element being returned.
That is: ::

    >>> x[0][2]
    2

So note that ``x[0,2] = x[0][2]`` though the second case is more inefficient
a new temporary array is created after the first index that is subsequently
indexed by 2.

Note to those used to IDL or Fortran memory order as it relates to indexing.
Numpy uses C-order indexing. That means that the last index usually (see
xxx for exceptions) represents the most rapidly changing memory location,
unlike Fortran or IDL, where the first index represents the most rapidly
changing location in memory. This difference represents a great potential
for confusion.

Other indexing options
======================

It is possible to slice and stride arrays to extract arrays of the same
number of dimensions, but of different sizes than the original. The slicing
and striding works exactly the same way it does for lists and tuples except
that they can be applied to multiple dimensions as well. A few
examples illustrates best: ::

 >>> x = np.arange(10)
 >>> x[2:5]
 array([2, 3, 4])
 >>> x[:-7]
 array([0, 1, 2])
 >>> x[1:7:2]
 array([1,3,5])
 >>> y = np.arange(35).reshape(5,7)
 >>> y[1:5:2,::3]
 array([[ 7, 10, 13],
        [21, 24, 27]])

Note that slices of arrays do not copy the internal array data but
also produce new views of the original data (see xxx for more
explanation of this issue).

It is possible to index arrays with other arrays for the purposes of
selecting lists of values out of arrays into new arrays. There are two
different ways of accomplishing this. One uses one or more arrays of
index values (see xxx for details). The other involves giving a boolean
array of the proper shape to indicate the values to be selected.
Index arrays are a very powerful tool that allow one to avoid looping
over individual elements in arrays and thus greatly improve performance
(see xxx for examples)

It is possible to use special features to effectively increase the
number of dimensions in an array through indexing so the resulting
array aquires the shape needed for use in an expression or with a
specific function. See xxx.

Index arrays
============

Numpy arrays may be indexed with other arrays (or any other sequence-like
object that can be converted to an array, such as lists, with the exception
of tuples; see the end of this document for why this is). The use of index
arrays ranges from simple, straightforward cases to complex, hard-to-understand
cases. For all cases of index arrays, what is returned is a copy of the
original data, not a view as one gets for slices.

Index arrays must be of integer type. Each value in the array indicates which
value in the array to use in place of the index. To illustrate: ::

 >>> x = np.arange(10,1,-1)
 >>> x
 array([10,  9,  8,  7,  6,  5,  4,  3,  2])
 >>> x[np.array([3, 3, 1, 8])]
 array([7, 7, 9, 2])


The index array consisting of the values 3, 3, 1 and 8 correspondingly create
an array of length 4 (same as the index array) where each index is replaced by
the value the index array has in the array being indexed.

Negative values are permitted and work as they do with single indices or slices: ::

 >>> x[np.array([3,3,-3,8])]
 array([7, 7, 4, 2])

It is an error to have index values out of bounds: ::

 >>> x[np.array([3, 3, 20, 8])]
 <type 'exceptions.IndexError'>: index 20 out of bounds 0<=index<9

Generally speaking, what is returned when index arrays are used is an array with
the same shape as the index array, but with the type and values of the array being
indexed. As an example, we can use a multidimensional index array instead: ::

 >>> x[np.array([[1,1],[2,3]])]
 array([[9, 9],
        [8, 7]])

Indexing Multi-dimensional arrays
=================================

Things become more complex when multidimensional arrays are indexed, particularly
with multidimensional index arrays. These tend to be more unusal uses, but they
are permitted, and they are useful for some problems. We'll  start with the
simplest multidimensional case (using the array y from the previous examples): ::

 >>> y[np.array([0,2,4]), np.array([0,1,2])]
 array([ 0, 15, 30])

In this case, if the index arrays have a matching shape, and there is an index
array for each dimension of the array being indexed, the resultant array has the
same shape as the index arrays, and the values correspond to the index set for each
position in the index arrays. In this example, the first index value is 0 for both
index arrays, and thus the first value of the resultant array is y[0,0]. The next
value is y[2,1], and the last is y[4,2].

If the index arrays do not have the same shape, there is an attempt to broadcast
them to the same shape. Broadcasting won't be discussed here but is discussed in
detail in xxx. If they cannot be broadcast to the same shape, an exception is
raised: ::

 >>> y[np.array([0,2,4]), np.array([0,1])]
 <type 'exceptions.ValueError'>: shape mismatch: objects cannot be broadcast to a single shape

The broadcasting mechanism permits index arrays to be combined with scalars for
other indices. The effect is that the scalar value is used for all the corresponding
values of the index arrays: ::

 >>> y[np.array([0,2,4]), 1]
 array([ 1, 15, 29])

Jumping to the next level of complexity, it is possible to only partially index an array
with index arrays. It takes a bit of thought to understand what happens in such cases.
For example if we just use one index array with y: ::

 >>> y[np.array([0,2,4])]
 array([[ 0,  1,  2,  3,  4,  5,  6],
        [14, 15, 16, 17, 18, 19, 20],
        [28, 29, 30, 31, 32, 33, 34]])

What results is the construction of a new array where each value of the index array
selects one row from the array being indexed and the resultant array has the resulting
shape (size of row, number index elements).

An example of where this may be useful is for a color lookup table where we want to map
the values of an image into RGB triples for display. The lookup table could have a shape
(nlookup, 3). Indexing such an array with an image with shape (ny, nx) with dtype=np.uint8
(or any integer type so long as values are with the bounds of the lookup table) will
result in an array of shape (ny, nx, 3) where a triple of RGB values is associated with
each pixel location.

In general, the shape of the resulant array will be the concatenation of the shape of
the index array (or the shape that all the index arrays were broadcast to) with the
shape of any unused dimensions (those not indexed) in the array being indexed.

Boolean or "mask" index arrays
==============================

Boolean arrays used as indices are treated in a different manner entirely than index
arrays. Boolean arrays must be of the same shape as the array being indexed, or
broadcastable to the same shape. In the most straightforward case, the boolean array
has the same shape: ::

 >>> b = y>20
 >>> y[b]
 array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

The result is a 1-D array containing all the elements in the indexed array corresponding
to all the true elements in the boolean array. As with index arrays, what is returned
is a copy of the data, not a view as one gets with slices.

With broadcasting, multidimesional arrays may be the result. For example: ::

 >>> b[:,5] # use a 1-D boolean that broadcasts with y
 array([False, False, False,  True,  True], dtype=bool)
 >>> y[b[:,5]]
 array([[21, 22, 23, 24, 25, 26, 27],
        [28, 29, 30, 31, 32, 33, 34]])

Here the 4th and 5th rows are selected from the indexed array and combined to make a
2-D array.

Combining index arrays with slices
==================================

Index arrays may be combined with slices. For example: ::

 >>> y[np.array([0,2,4]),1:3]
 array([[ 1,  2],
        [15, 16],
        [29, 30]])

In effect, the slice is converted to an index array np.array([[1,2]]) (shape (1,2)) that is
broadcast with the index array to produce a resultant array of shape (3,2).

Likewise, slicing can be combined with broadcasted boolean indices: ::

 >>> y[b[:,5],1:3]
 array([[22, 23],
        [29, 30]])

Structural indexing tools
=========================

To facilitate easy matching of array shapes with expressions and in
assignments, the np.newaxis object can be used within array indices
to add new dimensions with a size of 1. For example: ::

 >>> y.shape
 (5, 7)
 >>> y[:,np.newaxis,:].shape
 (5, 1, 7)

Note that there are no new elements in the array, just that the
dimensionality is increased. This can be handy to combine two
arrays in a way that otherwise would require explicitly reshaping
operations. For example: ::

 >>> x = np.arange(5)
 >>> x[:,np.newaxis] + x[np.newaxis,:]
 array([[0, 1, 2, 3, 4],
        [1, 2, 3, 4, 5],
        [2, 3, 4, 5, 6],
        [3, 4, 5, 6, 7],
        [4, 5, 6, 7, 8]])

The ellipsis syntax maybe used to indicate selecting in full any
remaining unspecified dimensions. For example: ::

 >>> z = np.arange(81).reshape(3,3,3,3)
 >>> z[1,...,2]
 array([[29, 32, 35],
        [38, 41, 44],
        [47, 50, 53]])

This is equivalent to: ::

 >>> z[1,:,:,2]

Assigning values to indexed arrays
==================================

As mentioned, one can select a subset of an array to assign to using
a single index, slices, and index and mask arrays. The value being
assigned to the indexed array must be shape consistent (the same shape
or broadcastable to the shape the index produces). For example, it is
permitted to assign a constant to a slice: ::

 >>> x[2:7] = 1

or an array of the right size: ::

 >>> x[2:7] = np.arange(5)

Note that assignments may result in changes if assigning
higher types to lower types (like floats to ints) or even
exceptions (assigning complex to floats or ints): ::

 >>> x[1] = 1.2
 >>> x[1]
 1
 >>> x[1] = 1.2j
 <type 'exceptions.TypeError'>: can't convert complex to long; use long(abs(z))


Unlike some of the references (such as array and mask indices)
assignments are always made to the original data in the array
(indeed, nothing else would make sense!). Note though, that some
actions may not work as one may naively expect. This particular
example is often surprising to people: ::

 >>> x[np.array([1, 1, 3, 1]) += 1

Where people expect that the 1st location will be incremented by 3.
In fact, it will only be incremented by 1. The reason is because
a new array is extracted from the original (as a temporary) containing
the values at 1, 1, 3, 1, then the value 1 is added to the temporary,
and then the temporary is assigned back to the original array. Thus
the value of the array at x[1]+1 is assigned to x[1] three times,
rather than being incremented 3 times.

Dealing with variable numbers of indices within programs
========================================================

The index syntax is very powerful but limiting when dealing with
a variable number of indices. For example, if you want to write
a function that can handle arguments with various numbers of
dimensions without having to write special case code for each
number of possible dimensions, how can that be done? If one
supplies to the index a tuple, the tuple will be interpreted
as a list of indices. For example (using the previous definition
for the array z): ::

 >>> indices = (1,1,1,1)
 >>> z[indices]
 40

So one can use code to construct tuples of any number of indices
and then use these within an index.

Slices can be specified within programs by using the slice() function
in Python. For example: ::

 >>> indices = (1,1,1,slice(0,2)) # same as [1,1,1,0:2]
 array([39, 40])

Likewise, ellipsis can be specified by code by using the Ellipsis object: ::

 >>> indices = (1, Ellipsis, 1) # same as [1,...,1]
 >>> z[indices]
 array([[28, 31, 34],
        [37, 40, 43],
        [46, 49, 52]])

For this reason it is possible to use the output from the np.where()
function directly as an index since it always returns a tuple of index arrays.

Because the special treatment of tuples, they are not automatically converted
to an array as a list would be. As an example: ::

 >>> z[[1,1,1,1]]
 ... # produces a large array
 >>> z[(1,1,1,1)]
 40 # returns a single value

"""