Select along one of n dimensions in array
I spent quite a lot of time figuring out the fastest way to do this for plyr, and the best I could come up with was manually constructing the call to [
:
index_array <- function(x, dim, value, drop = FALSE) {
# Create list representing arguments supplied to [
# bquote() creates an object corresponding to a missing argument
indices <- rep(list(bquote()), length(dim(x)))
indices[[dim]] <- value
# Generate the call to [
call <- as.call(c(
list(as.name("["), quote(x)),
indices,
list(drop = drop)))
# Print it, just to make it easier to see what's going on
print(call)
# Finally, evaluate it
eval(call)
}
(You can find more information about this technique at https://github.com/hadley/devtools/wiki/Computing-on-the-language)
You can then use it as follows:
A <- array(data=NA, dim=c(2,4,4), dimnames=list(c("x","y"),NULL,NULL))
index_array(A, 2, 2)
index_array(A, 2, 2, drop = TRUE)
index_array(A, 3, 2, drop = TRUE)
It would also generalise in a straightforward way if you want to extract based on more than one dimension, but you'd need to rethink the arguments to the function.
Select and apply condition on numpy n-dimensional inner array
In [78]: arr = np.array([[[248, 26, 4],
...: [ 99, 126, 156],
...: [ 80, 240, 232],
...: [136, 27, 216]],
...:
...: [[221, 130, 119],
...: [253, 188, 232],
...: [159, 21, 98],
...: [ 12, 35, 50]]])
...:
Here's a 'masking' way of handling the >100
case:
In [79]: (arr>100)
Out[79]:
array([[[ True, False, False],
[False, True, True],
[False, True, True],
[ True, False, True]],
[[ True, True, True],
[ True, True, True],
[ True, False, False],
[False, False, False]]])
In [80]: (arr>100).all(axis=2)
Out[80]:
array([[False, False, False, False],
[ True, True, False, False]])
In [81]: arr[_]
Out[81]:
array([[221, 130, 119],
[253, 188, 232]])
In [82]: arr1=arr.copy()
In [83]: arr1[(arr>100).all(axis=2)]=255
In [84]: arr1
Out[84]:
array([[[248, 26, 4],
[ 99, 126, 156],
[ 80, 240, 232],
[136, 27, 216]],
[[255, 255, 255],
[255, 255, 255],
[159, 21, 98],
[ 12, 35, 50]]])
and for the 2nd condition, using 75 for arr[:,:,2]:
In [94]: arr1[(arr<np.array([25,50,75])).all(axis=2)]=0
In [95]: arr1
Out[95]:
array([[[248, 26, 4],
[ 99, 126, 156],
[ 80, 240, 232],
[136, 27, 216]],
[[255, 255, 255],
[255, 255, 255],
[159, 21, 98],
[ 0, 0, 0]]])
Here's a way of using these masks with select
. Note that I have to adjust the dimensions of the masks so they work with the 3d default
.
In [104]: mask1 = (arr>100).all(axis=2)
In [105]: mask2 = (arr<np.array([25,50,75])).all(axis=2)
In [106]: mask1.shape
Out[106]: (2, 4)
In [107]: arr.shape
Out[107]: (2, 4, 3)
In [108]: np.select([mask1[:,:,None],mask2[:,:,None]],[255,0], default=arr)
Out[108]:
array([[[248, 26, 4],
[ 99, 126, 156],
[ 80, 240, 232],
[136, 27, 216]],
[[255, 255, 255],
[255, 255, 255],
[159, 21, 98],
[ 0, 0, 0]]])
Or use:
In [109]: mask1 = (arr>100).all(axis=2, keepdims=True)
In [110]: mask1.shape
Out[110]: (2, 4, 1)
To use your syntax, we need to add () and use &
instead of and
:
In [111]: (arr[:,:,0] > 100) & (arr[:,:,1] > 100) & (arr[:,:,2] > 100)
Out[111]:
array([[False, False, False, False],
[ True, True, False, False]])
How to select elements from a multidimensional array with python : array[:][i][j]?
You can do as follows to select for all channel, the first element of the matrix:
A[:, 0, 0]
:
references all channels, 0
the first row and 0
the first column.
Output:
array([0.89565135,0.70610822,0.71822397])
EDIT:
If your are working with nested list
, you can do as follows:
[array[0][0]for array in A]
This shows one of the benefit of numpy
array over python list
as the selection is much more easier (and faster) using numpy
.
Selecting (n-1)D array from (n)D array in numpy
To get a face:
def get_face(M, dim, front_side):
if front_side:
side = 0
else:
side = -1
index = tuple(side if i == dim else slice(None) for i in range(M.ndim))
return M[index]
To add a face (untested):
def add_face(M, new_face, dim, front_side):
#assume sizes match up correctly
if front_side:
return np.concatenate((new_face, M), dim)
else:
return np.concatenate((M, new_face), dim)
To remove a face:
def remove_face(M, dim, front_side):
if front_side:
dim_slice = slice(1, None)
else:
dim_slice = slice(None, -1)
index = tuple(dim_slice if i == dim else slice(None) for i in range(M.ndim))
return M[index]
Iterate over all faces:
def iter_faces(M):
for dim in range(M.ndim):
for front_side in (True, False):
yield get_face(M, dim, front_side)
Some quick tests:
In [18]: M = np.arange(27).reshape((3,3,3))
In [19]: for face in iter_faces(M): print face
[[0 1 2]
[3 4 5]
[6 7 8]]
[[18 19 20]
[21 22 23]
[24 25 26]]
[[ 0 1 2]
[ 9 10 11]
[18 19 20]]
[[ 6 7 8]
[15 16 17]
[24 25 26]]
[[ 0 3 6]
[ 9 12 15]
[18 21 24]]
[[ 2 5 8]
[11 14 17]
[20 23 26]]
How can I select values along an axis of an nD array with an (n-1)D array of indices of that axis?
For the 2 and 1d case, this indexing works:
A[np.arange(J.shape[0]), J]
Which can be applied to more dimensions by reshaping to 2d (and back):
A.reshape(-1, A.shape[-1])[np.arange(np.prod(A.shape[:-1])).reshape(J.shape), J]
For 3d A
this works:
A[np.arange(J.shape[0])[:,None], np.arange(J.shape[1])[None,:], J]
where the 1st 2 arange
indices broadcast to the same dimension as J
.
With functions in lib.index_tricks
, this can be expressed as:
A[np.ogrid[0:J.shape[0],0:J.shape[1]]+[J]]
A[np.ogrid[slice(J.shape[0]),slice(J.shape[1])]+[J]]
or for multiple dimensions:
A[np.ix_(*[np.arange(x) for x in J.shape])+(J,)]
A[np.ogrid[[slice(k) for k in J.shape]]+[J]]
For small A
and J
(eg 2*3*4), J.choose(np.rollaxis(A,-1))
is faster. All of the extra time is in preparing the index tuple. np.ix_
is faster than np.ogrid
.
np.choose
has a size limit. At its upper end it is slower than ix_
:
In [610]: Abig=np.arange(31*31).reshape(31,31)
In [611]: Jbig=np.arange(31)
In [612]: Jbig.choose(np.rollaxis(Abig,-1))
Out[612]:
array([ 0, 32, 64, 96, 128, 160, ... 960])
In [613]: timeit Jbig.choose(np.rollaxis(Abig,-1))
10000 loops, best of 3: 73.1 µs per loop
In [614]: timeit Abig[np.ix_(*[np.arange(x) for x in Jbig.shape])+(Jbig,)]
10000 loops, best of 3: 22.7 µs per loop
In [635]: timeit Abig.ravel()[Jbig+Abig.shape[-1]*np.arange(0,np.prod(Jbig.shape)).reshape(Jbig.shape) ]
10000 loops, best of 3: 44.8 µs per loop
I did similar indexing tests at https://stackoverflow.com/a/28007256/901925, and found that flat
indexing was faster for much larger arrays (e.g. n0=1000
). That's where I learned about the 32 limit for choice
.
How to select values in a n-dimensional array
The tricky aspect is that you don't want all of the 0th-dimension for each slice, you want the slices to correspond to each element in the 0th-dimension. So you could do something like:
>>> x[np.arange(x.shape[0]), :, y]
array([[ 0, 2, 4],
[ 7, 9, 11],
[13, 15, 17],
[18, 20, 22]])
Index n dimensional array with (n-1) d array
Make use of advanced-indexing
-
m,n = a.shape[1:]
I,J = np.ogrid[:m,:n]
a_max_values = a[idx, I, J]
b_max_values = b[idx, I, J]
For the general case:
def argmax_to_max(arr, argmax, axis):
"""argmax_to_max(arr, arr.argmax(axis), axis) == arr.max(axis)"""
new_shape = list(arr.shape)
del new_shape[axis]
grid = np.ogrid[tuple(map(slice, new_shape))]
grid.insert(axis, argmax)
return arr[tuple(grid)]
Quite a bit more awkward than such a natural operation should be, unfortunately.
For indexing a n dim
array with a (n-1) dim
array, we could simplify it a bit to give us the grid of indices for all axes, like so -
def all_idx(idx, axis):
grid = np.ogrid[tuple(map(slice, idx.shape))]
grid.insert(axis, idx)
return tuple(grid)
Hence, use it to index into input arrays -
axis = 0
a_max_values = a[all_idx(idx, axis=axis)]
b_max_values = b[all_idx(idx, axis=axis)]
Slicing n-dimensional numpy array using list of indices
Numpy uses multiple indexing, so instead of A[1][2][3]
, you can--and should--use A[1,2,3]
.
You might then think you could do A[:, second, third]
, but the numpy indices are broadcast, and broadcasting second
and third
(two one-dimensional sequences) ends up being the numpy equivalent of zip
, so the result has shape (5, 2)
.
What you really want is to index with, in effect, the outer product of second
and third
. You can do this with broadcasting by making one of them, say second
into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second
and third
together is (2,2)
.
For example:
In [8]: import numpy as np
In [9]: a = np.arange(125).reshape(5,5,5)
In [10]: second = [1,2]
In [11]: third = [3,4]
In [12]: s = a[:, np.array(second).reshape(-1,1), third]
In [13]: s.shape
Out[13]: (5, 2, 2)
Note that, in this specific example, the values in second
and third
are sequential. If that is typical, you can simply use slices:
In [14]: s2 = a[:, 1:3, 3:5]
In [15]: s2.shape
Out[15]: (5, 2, 2)
In [16]: np.all(s == s2)
Out[16]: True
There are a couple very important difference in those two methods.
- The first method would also work with indices that are not equivalent to slices. For example, it would work if
second = [0, 2, 3]
. (Sometimes you'll see this style of indexing referred to as "fancy indexing".) - In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array
s2
is a view into the same block of memory used bya
. An in-place change in one will change them both.
Related Topics
Filling in Missing (Blanks) in a Data Table, Per Category - Backwards and Forwards
Shinydashboard Some Font Awesome Icons Not Working
Select Along One of N Dimensions in Array
Efficient String Similarity Grouping
How to Cumulatively Add Values in One Vector in R
Set the Order of a Stacked Bar Chart by the Value of One of the Variables
How to Scrape/Automatically Download PDF Files from a Document Search Web Interface in R
How to Specify Columns in Facet_Grid or How to Change Labels in Facet_Wrap
Error: Vector Memory Exhausted (Limit Reached) R 3.5.0 MACos
Group Data and Plot Multiple Lines
Plot Data Over Background Image with Ggplot
How to Use R Plotly Library in R Script Visual of Power Bi
What Type of Graph Is This? and Can It Be Created Using Ggplot2
How to Select R Data.Table Rows Based on Substring Match (A La SQL Like)