Selecting Multiple Slices from a Numpy Array at Once

Selecting multiple slices from a numpy array at once

You can use the indexes to select the rows you want into the appropriate shape.
For example:

 data = np.random.normal(size=(100,2,2,2))

# Creating an array of row-indexes
indexes = np.array([np.arange(0,5), np.arange(1,6), np.arange(2,7)])
# data[indexes] will return an element of shape (3,5,2,2,2). Converting
# to list happens along axis 0
data_extractions = list(data[indexes])

np.all(data_extractions[1] == data[1:6])
True

The final comparison is against the original data.

Select from multiple slices in Numpy

Do you know what r_ does? It converts the slices into ranges, and then concatenates the whole mess together.

I don't know if you can use r_ or something similar to construct the required indices. But:

In [168]: idx = np.where(a==0)
In [169]: idx
Out[169]:
(array([0, 0, 0, 0, 0, 1, 2]),
array([0, 1, 1, 1, 2, 1, 1]),
array([0, 0, 1, 2, 0, 0, 0]))

this is gives us an idea of the required indexing arrays (minus some likely duplicates).


It might be possible to concatenate these 3 ogrid lists into a composite:

In [181]: np.ogrid[0:1,1:2,:3]
Out[181]: [array([[[0]]]), array([[[1]]]), array([[[0, 1, 2]]])]

In [182]: np.ogrid[0:1,:3,0:1]
Out[182]:
[array([[[0]]]), array([[[0],
[1],
[2]]]), array([[[0]]])]

In [183]: np.ogrid[:3,1:2,0:1]
Out[183]:
[array([[[0]],

[[1]],

[[2]]]), array([[[1]]]), array([[[0]]])]

Individually they select the 0s in a.

It may be easiest to convert them into their raveled equivalents, and join the resulting 1d arrays.

In [188]: np.ravel_multi_index(Out[181],(3,3,3))
Out[188]: array([[[3, 4, 5]]])
etc
In [195]: np.hstack([Out[188].ravel(), Out[189].ravel(), Out[190].ravel()])
Out[195]: array([ 3, 4, 5, 0, 3, 6, 3, 12, 21])
In [197]: a.flat[_]
Out[197]: array([0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [199]: np.unravel_index(Out[195],(3,3,3))
Out[199]:
(array([0, 0, 0, 0, 0, 0, 0, 1, 2]),
array([1, 1, 1, 0, 1, 2, 1, 1, 1]),
array([0, 1, 2, 0, 0, 0, 0, 0, 0]))

Out[169] and Out[199] have the same values, except for duplicates.

This is a generalization of the problem of joining several 1d slices. Indexing and then concatenating takes about as much time as concatenating the indices first.

Numpy: An efficient way to merge multiple slices

You can use np.r_ to create the respective range of indices from your slices. It also accepts multiple slice at once.

In [25]: test_array[:, np.r_[1:3, 2:4, 3:15, 2:24, 6:8, 12:13]]
Out[25]:
array([[ 1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 6, 7, 12],
[26, 27, 27, 28, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 31, 32, 37],
[51, 52, 52, 53, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 56, 57, 62],
[76, 77, 77, 78, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,
95, 96, 97, 98, 81, 82, 87]])

Note that as mentioned in comment using r_ is nicer to read and write but does't avoid copying data. And that's because Advanced Indexing always returns a copy, unlike the regular indexing that returns views from array.

How can I define multiple slices of a numpy array based on pairs of start/end indices without iterating?

Approach #1

One vectorized approach would be with masking created off with broadcasting -

In [16]: r = np.arange(len(x))

In [18]: x[((r>=starts[:,None]) & (r<ends[:,None])).any(0)]
Out[18]: array([ 5, 7, 9, 21, 27])

Approach #2

Another vectorized way would be with creating ramps of 1s and 0s with cumsum (should be better with many start-end pairs), like so -

idx = np.zeros(len(x),dtype=int)
idx[starts] = 1
idx[ends[ends<len(x)]] = -1
out = x[idx.cumsum().astype(bool)]

Approach #3

Another loop-based one to achieve memory-efficiency, could be better with many entries in starts,ends pairs -

mask = np.zeros(len(x),dtype=bool)
for (i,j) in zip(starts,ends):
mask[i:j] = True
out = x[mask]

Approach #4

For completeness, here's another with loop to select slices and then assign into an initialized array and should be good on slices to be selected off a large array -

lens = ends-starts
out = np.empty(lens.sum(),dtype=x.dtype)
start = 0
for (i,j,l) in zip(starts,ends,lens):
out[start:start+l] = x[i:j]
start += l

If the iterations are a lot, there's a minor optimization possible to reduce compute per iteration -

lens = ends-starts
lims = np.r_[0,lens].cumsum()
out = np.empty(lims[-1],dtype=x.dtype)
for (i,j,s,t) in zip(starts,ends,lims[:-1],lims[1:]):
out[s:t] = x[i:j]

How to extract multiple slices in an array?

You can slice twice and join them.

listing[0:3] + listing[4:5]

Select different slices from each numpy row

You can use advanced indexing as explained here. You will have to pass the row ids which are [0, 1] in your case and the column ids 2, 3 and 1, 2. Here 2,3 means [2:4] and 1, 2 means [1:3]

import numpy as np
a=np.arange(2*3*5).reshape(2, 3, 5)

rows = np.array([[0], [1]], dtype=np.intp)
cols = np.array([[2, 3], [1, 2]], dtype=np.intp)

aa = np.stack(a[rows, :, cols]).swapaxes(1, 2)
# array([[[ 2, 3],
# [ 7, 8],
# [12, 13]],

# [[16, 17],
# [21, 22],
# [26, 27]]])

Another equivalent way to avoid swapaxes and getting the result in desired format is

aa = np.stack(a[rows, :, cols], axis=2).T

A third way I figured out is by passing the list of indices. Here [0, 0] will correspond to [2,3] and [1, 1] will correspond to [1, 2]. The swapaxes is just to get your desired format of output

a[[[0,0], [1,1]], :, [[2,3], [1,2]]].swapaxes(1,2)


Related Topics



Leave a reply



Submit