Select Along One of N Dimensions in Array

Select along one of n dimensions in array

I spent quite a lot of time figuring out the fastest way to do this for plyr, and the best I could come up with was manually constructing the call to [:

index_array <- function(x, dim, value, drop = FALSE) { 
# Create list representing arguments supplied to [
# bquote() creates an object corresponding to a missing argument
indices <- rep(list(bquote()), length(dim(x)))
indices[[dim]] <- value

# Generate the call to [
call <- as.call(c(
list(as.name("["), quote(x)),
indices,
list(drop = drop)))
# Print it, just to make it easier to see what's going on
print(call)

# Finally, evaluate it
eval(call)
}

(You can find more information about this technique at https://github.com/hadley/devtools/wiki/Computing-on-the-language)

You can then use it as follows:

A <- array(data=NA, dim=c(2,4,4), dimnames=list(c("x","y"),NULL,NULL))
index_array(A, 2, 2)
index_array(A, 2, 2, drop = TRUE)
index_array(A, 3, 2, drop = TRUE)

It would also generalise in a straightforward way if you want to extract based on more than one dimension, but you'd need to rethink the arguments to the function.

Select and apply condition on numpy n-dimensional inner array

In [78]: arr = np.array([[[248,  26,   4],
...: [ 99, 126, 156],
...: [ 80, 240, 232],
...: [136, 27, 216]],
...:
...: [[221, 130, 119],
...: [253, 188, 232],
...: [159, 21, 98],
...: [ 12, 35, 50]]])
...:

Here's a 'masking' way of handling the >100 case:

In [79]: (arr>100)
Out[79]:
array([[[ True, False, False],
[False, True, True],
[False, True, True],
[ True, False, True]],

[[ True, True, True],
[ True, True, True],
[ True, False, False],
[False, False, False]]])
In [80]: (arr>100).all(axis=2)
Out[80]:
array([[False, False, False, False],
[ True, True, False, False]])
In [81]: arr[_]
Out[81]:
array([[221, 130, 119],
[253, 188, 232]])
In [82]: arr1=arr.copy()
In [83]: arr1[(arr>100).all(axis=2)]=255
In [84]: arr1
Out[84]:
array([[[248, 26, 4],
[ 99, 126, 156],
[ 80, 240, 232],
[136, 27, 216]],

[[255, 255, 255],
[255, 255, 255],
[159, 21, 98],
[ 12, 35, 50]]])

and for the 2nd condition, using 75 for arr[:,:,2]:

In [94]: arr1[(arr<np.array([25,50,75])).all(axis=2)]=0
In [95]: arr1
Out[95]:
array([[[248, 26, 4],
[ 99, 126, 156],
[ 80, 240, 232],
[136, 27, 216]],

[[255, 255, 255],
[255, 255, 255],
[159, 21, 98],
[ 0, 0, 0]]])

Here's a way of using these masks with select. Note that I have to adjust the dimensions of the masks so they work with the 3d default.

In [104]: mask1 = (arr>100).all(axis=2)
In [105]: mask2 = (arr<np.array([25,50,75])).all(axis=2)
In [106]: mask1.shape
Out[106]: (2, 4)
In [107]: arr.shape
Out[107]: (2, 4, 3)
In [108]: np.select([mask1[:,:,None],mask2[:,:,None]],[255,0], default=arr)
Out[108]:
array([[[248, 26, 4],
[ 99, 126, 156],
[ 80, 240, 232],
[136, 27, 216]],

[[255, 255, 255],
[255, 255, 255],
[159, 21, 98],
[ 0, 0, 0]]])

Or use:

In [109]: mask1 = (arr>100).all(axis=2, keepdims=True)
In [110]: mask1.shape
Out[110]: (2, 4, 1)

To use your syntax, we need to add () and use & instead of and:

In [111]: (arr[:,:,0] > 100) & (arr[:,:,1] > 100) & (arr[:,:,2] > 100)
Out[111]:
array([[False, False, False, False],
[ True, True, False, False]])

How to select elements from a multidimensional array with python : array[:][i][j]?

You can do as follows to select for all channel, the first element of the matrix:

A[:, 0, 0]

: references all channels, 0 the first row and 0 the first column.

Output:

array([0.89565135,0.70610822,0.71822397])

EDIT:

If your are working with nested list, you can do as follows:

[array[0][0]for array in A]

This shows one of the benefit of numpy array over python list as the selection is much more easier (and faster) using numpy.

Selecting (n-1)D array from (n)D array in numpy

To get a face:

def get_face(M, dim, front_side):
if front_side:
side = 0
else:
side = -1
index = tuple(side if i == dim else slice(None) for i in range(M.ndim))
return M[index]

To add a face (untested):

def add_face(M, new_face, dim, front_side):
#assume sizes match up correctly
if front_side:
return np.concatenate((new_face, M), dim)
else:
return np.concatenate((M, new_face), dim)

To remove a face:

def remove_face(M, dim, front_side):
if front_side:
dim_slice = slice(1, None)
else:
dim_slice = slice(None, -1)
index = tuple(dim_slice if i == dim else slice(None) for i in range(M.ndim))
return M[index]

Iterate over all faces:

def iter_faces(M):
for dim in range(M.ndim):
for front_side in (True, False):
yield get_face(M, dim, front_side)

Some quick tests:

In [18]: M = np.arange(27).reshape((3,3,3))
In [19]: for face in iter_faces(M): print face
[[0 1 2]
[3 4 5]
[6 7 8]]
[[18 19 20]
[21 22 23]
[24 25 26]]
[[ 0 1 2]
[ 9 10 11]
[18 19 20]]
[[ 6 7 8]
[15 16 17]
[24 25 26]]
[[ 0 3 6]
[ 9 12 15]
[18 21 24]]
[[ 2 5 8]
[11 14 17]
[20 23 26]]

How can I select values along an axis of an nD array with an (n-1)D array of indices of that axis?

For the 2 and 1d case, this indexing works:

A[np.arange(J.shape[0]), J]

Which can be applied to more dimensions by reshaping to 2d (and back):

A.reshape(-1, A.shape[-1])[np.arange(np.prod(A.shape[:-1])).reshape(J.shape), J]

For 3d A this works:

A[np.arange(J.shape[0])[:,None], np.arange(J.shape[1])[None,:], J]

where the 1st 2 arange indices broadcast to the same dimension as J.

With functions in lib.index_tricks, this can be expressed as:

A[np.ogrid[0:J.shape[0],0:J.shape[1]]+[J]]
A[np.ogrid[slice(J.shape[0]),slice(J.shape[1])]+[J]]

or for multiple dimensions:

A[np.ix_(*[np.arange(x) for x in J.shape])+(J,)]
A[np.ogrid[[slice(k) for k in J.shape]]+[J]]

For small A and J (eg 2*3*4), J.choose(np.rollaxis(A,-1)) is faster. All of the extra time is in preparing the index tuple. np.ix_ is faster than np.ogrid.

np.choose has a size limit. At its upper end it is slower than ix_:

In [610]: Abig=np.arange(31*31).reshape(31,31)
In [611]: Jbig=np.arange(31)
In [612]: Jbig.choose(np.rollaxis(Abig,-1))
Out[612]:
array([ 0, 32, 64, 96, 128, 160, ... 960])

In [613]: timeit Jbig.choose(np.rollaxis(Abig,-1))
10000 loops, best of 3: 73.1 µs per loop
In [614]: timeit Abig[np.ix_(*[np.arange(x) for x in Jbig.shape])+(Jbig,)]
10000 loops, best of 3: 22.7 µs per loop
In [635]: timeit Abig.ravel()[Jbig+Abig.shape[-1]*np.arange(0,np.prod(Jbig.shape)).reshape(Jbig.shape) ]
10000 loops, best of 3: 44.8 µs per loop

I did similar indexing tests at https://stackoverflow.com/a/28007256/901925, and found that flat indexing was faster for much larger arrays (e.g. n0=1000). That's where I learned about the 32 limit for choice.

How to select values in a n-dimensional array

The tricky aspect is that you don't want all of the 0th-dimension for each slice, you want the slices to correspond to each element in the 0th-dimension. So you could do something like:

>>> x[np.arange(x.shape[0]), :, y]
array([[ 0, 2, 4],
[ 7, 9, 11],
[13, 15, 17],
[18, 20, 22]])

Index n dimensional array with (n-1) d array

Make use of advanced-indexing -

m,n = a.shape[1:]
I,J = np.ogrid[:m,:n]
a_max_values = a[idx, I, J]
b_max_values = b[idx, I, J]

For the general case:

def argmax_to_max(arr, argmax, axis):
"""argmax_to_max(arr, arr.argmax(axis), axis) == arr.max(axis)"""
new_shape = list(arr.shape)
del new_shape[axis]

grid = np.ogrid[tuple(map(slice, new_shape))]
grid.insert(axis, argmax)

return arr[tuple(grid)]

Quite a bit more awkward than such a natural operation should be, unfortunately.

For indexing a n dim array with a (n-1) dim array, we could simplify it a bit to give us the grid of indices for all axes, like so -

def all_idx(idx, axis):
grid = np.ogrid[tuple(map(slice, idx.shape))]
grid.insert(axis, idx)
return tuple(grid)

Hence, use it to index into input arrays -

axis = 0
a_max_values = a[all_idx(idx, axis=axis)]
b_max_values = b[all_idx(idx, axis=axis)]

Slicing n-dimensional numpy array using list of indices

Numpy uses multiple indexing, so instead of A[1][2][3], you can--and should--use A[1,2,3].

You might then think you could do A[:, second, third], but the numpy indices are broadcast, and broadcasting second and third (two one-dimensional sequences) ends up being the numpy equivalent of zip, so the result has shape (5, 2).

What you really want is to index with, in effect, the outer product of second and third. You can do this with broadcasting by making one of them, say second into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second and third together is (2,2).

For example:

In [8]: import numpy as np

In [9]: a = np.arange(125).reshape(5,5,5)

In [10]: second = [1,2]

In [11]: third = [3,4]

In [12]: s = a[:, np.array(second).reshape(-1,1), third]

In [13]: s.shape
Out[13]: (5, 2, 2)

Note that, in this specific example, the values in second and third are sequential. If that is typical, you can simply use slices:

In [14]: s2 = a[:, 1:3, 3:5]

In [15]: s2.shape
Out[15]: (5, 2, 2)

In [16]: np.all(s == s2)
Out[16]: True

There are a couple very important difference in those two methods.

  • The first method would also work with indices that are not equivalent to slices. For example, it would work if second = [0, 2, 3]. (Sometimes you'll see this style of indexing referred to as "fancy indexing".)
  • In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array s2 is a view into the same block of memory used by a. An in-place change in one will change them both.


Related Topics



Leave a reply



Submit