Taking Subarrays from Numpy Array With Given Stride/Stepsize

Taking subarrays from numpy array with given stride/stepsize

Approach #1 : Using broadcasting -

def broadcasting_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]

Approach #2 : Using more efficient NumPy strides -

def strided_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

Sample run -

In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])

In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])

reshaping numpy array into small subslices

You can use numpy stride tricks (numpy.lib.stride_tricks.as_strided) to create a new view of the array. This will be faster than any list comprehension because no data are copied. The IPython Cookbook has more examples of using stride tricks.

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
bytes_per_item = a.dtype.itemsize
b = np.lib.stride_tricks.as_strided(
a, shape=(8, 3), strides=(bytes_per_item, bytes_per_item))
array([[ 1,  2,  3],
[ 2, 3, 4],
[ 3, 4, 5],
[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10]])


Timed tests

This answer is orders of magnitude faster than answers here that use loops. Find the tests below (done in Jupyter Notebook with %timeit magic). Note that one of the functions does not work properly with numpy arrays and requires a Python list.

Setup

import numpy as np

a = np.arange(1, 100001, dtype=np.int64)
a_list = a.tolist()

def jakub(a, shape):
a = np.asarray(a)
bytes_per_item = a.dtype.itemsize
# The docs for this function recommend setting `writeable=False` to
# prevent modifying the underlying array.
return np.lib.stride_tricks.as_strided(
a, shape=shape, strides=(bytes_per_item, bytes_per_item), writeable=False)

# https://stackoverflow.com/a/63426256/5666087
def daveldito(arr):
return np.array([arr[each:each+2]+[arr[each+2]] for each in range(len(arr)-2)])

# https://stackoverflow.com/a/63426205/5666087
def akshay_sehgal(a):
return np.array([i for i in zip(a,a[1:],a[2:])])

Results

%timeit jakub(a, shape=(a.shape[0]-2, 3))
8.85 µs ± 425 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit daveldito(a_list)
141 ms ± 8.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit akshay_sehgal(a)
168 ms ± 9.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Can numpy strides stride only within subarrays?

Sure, that's possible with np.lib.stride_tricks.as_strided. Here's one way -

from numpy.lib.stride_tricks import as_strided

L = 2 # window length
shp = a.shape
strd = a.strides

out_shp = shp[0],shp[1]-L+1,L
out_strd = strd + (strd[1],)

out = as_strided(a, out_shp, out_strd).reshape(-1,L)

Sample input, output -

In [177]: a
Out[177]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])

In [178]: out
Out[178]:
array([[0, 1],
[1, 2],
[2, 3],
[4, 5],
[5, 6],
[6, 7]])

Note that the last step of reshaping forces it to make a copy there. But that's can't be avoided if we need the final output to be a 2D. If we are okay with a 3D output, skip that reshape and thus achieve a view, as shown with the sample case -

In [181]: np.shares_memory(a, out)
Out[181]: False

In [182]: as_strided(a, out_shp, out_strd)
Out[182]:
array([[[0, 1],
[1, 2],
[2, 3]],

[[4, 5],
[5, 6],
[6, 7]]])

In [183]: np.shares_memory(a, as_strided(a, out_shp, out_strd) )
Out[183]: True

How can I divide a numpy array into n sub-arrays using a sliding window of size m?

I think your current method does not produce what you are describing.
Here is a faster method which splits a long array into many sub arrays using list comprehension:

Code Fix:

import numpy as np 

x = np.arange(10000)
T = np.array([])

T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])

Speed Comparison:

sample_1 = '''
import numpy as np

x = np.arange(10000)
T = np.array([])

for i in range(len(x)-11):
s = x[i:i+11]
T = np.concatenate((T, s),axis=0)

'''

sample_2 = '''
import numpy as np

x = np.arange(10000)
T = np.array([])

T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])
'''

# Testing the times
import timeit
print(timeit.timeit(sample_1, number=1))
print(timeit.timeit(sample_2, number=1))

Speed Comparison Output:

5.839815437000652   # Your method
0.11047088200211874 # List Comprehension

I only checked 1 iteration as the difference is quite significant and many iterations would not change the overall outcome.

Output Comparison:

# Your method:
[ 0.00000000e+00 1.00000000e+00 2.00000000e+00 ..., 9.99600000e+03
9.99700000e+03 9.99800000e+03]

# Using List Comprehension:
[[ 0 1 2 ..., 8 9 10]
[ 1 2 3 ..., 9 10 11]
[ 2 3 4 ..., 10 11 12]
...,
[9986 9987 9988 ..., 9994 9995 9996]
[9987 9988 9989 ..., 9995 9996 9997]
[9988 9989 9990 ..., 9996 9997 9998]]

You can see that my method actually produces sub-arrays, unlike what your provided code does.

Note:

These tests were carried out on x which was just a list of ordered numbers from 0 to 10000.

Extract subarrays from 1D array given start indices - Python / NumPy

Use broadcasted addition to create all those indices and then index -

all_idx = ids[:,None]+range(4) # or np.add.outer(ids, range(4))
out = arr[all_idx]

Using np.lib.stride_tricks.as_strided based strided_app -

strided_app(arr, 4, S=1)[ids]

How can I generate an array based on the grouped values of another array in Numpy?

How about this?

from numpy.lib import stride_tricks
Z = np.arange(1,15,dtype=np.uint32)
R = stride_tricks.as_strided(Z,(11,4),(4,4))
print(R)

Output:

[[ 1  2  3  4]
[ 2 3 4 5]
[ 3 4 5 6]
[ 4 5 6 7]
[ 5 6 7 8]
[ 6 7 8 9]
[ 7 8 9 10]
[ 8 9 10 11]
[ 9 10 11 12]
[10 11 12 13]
[11 12 13 14]]

As navneethc righly pointed out this function should be used with caution.

Taking subarrays from numpy array with given stride/stepsize

Approach #1 : Using broadcasting -

def broadcasting_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]

Approach #2 : Using more efficient NumPy strides -

def strided_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

Sample run -

In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])

In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])

Is there a fast way to get all neighbor elements in a list?

Use as_strided from stride_tricks library

from numpy.lib.stride_tricks import as_strided

n = len(letters) - 1
m = np.array(letters[0]).itemsize

arr = as_strided(letters, shape=(n, 2), strides=(m, m))

Out[289]:
array([['a', 'b'],
['b', 'c'],
['c', 'd'],
['d', 'e'],
['e', 'f'],
['f', 'g'],
['g', 'h'],
['h', 'i'],
['i', 'j'],
['j', 'k']], dtype='<U1')


Related Topics



Leave a reply



Submit