Taking subarrays from numpy array with given stride/stepsize
Approach #1 : Using broadcasting
-
def broadcasting_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]
Approach #2 : Using more efficient NumPy strides
-
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
Sample run -
In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
reshaping numpy array into small subslices
You can use numpy stride tricks (numpy.lib.stride_tricks.as_strided
) to create a new view of the array. This will be faster than any list comprehension because no data are copied. The IPython Cookbook has more examples of using stride tricks.
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
bytes_per_item = a.dtype.itemsize
b = np.lib.stride_tricks.as_strided(
a, shape=(8, 3), strides=(bytes_per_item, bytes_per_item))
array([[ 1, 2, 3],
[ 2, 3, 4],
[ 3, 4, 5],
[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10]])
Timed tests
This answer is orders of magnitude faster than answers here that use loops. Find the tests below (done in Jupyter Notebook with %timeit
magic). Note that one of the functions does not work properly with numpy arrays and requires a Python list.
Setup
import numpy as np
a = np.arange(1, 100001, dtype=np.int64)
a_list = a.tolist()
def jakub(a, shape):
a = np.asarray(a)
bytes_per_item = a.dtype.itemsize
# The docs for this function recommend setting `writeable=False` to
# prevent modifying the underlying array.
return np.lib.stride_tricks.as_strided(
a, shape=shape, strides=(bytes_per_item, bytes_per_item), writeable=False)
# https://stackoverflow.com/a/63426256/5666087
def daveldito(arr):
return np.array([arr[each:each+2]+[arr[each+2]] for each in range(len(arr)-2)])
# https://stackoverflow.com/a/63426205/5666087
def akshay_sehgal(a):
return np.array([i for i in zip(a,a[1:],a[2:])])
Results
%timeit jakub(a, shape=(a.shape[0]-2, 3))
8.85 µs ± 425 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit daveldito(a_list)
141 ms ± 8.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit akshay_sehgal(a)
168 ms ± 9.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Can numpy strides stride only within subarrays?
Sure, that's possible with np.lib.stride_tricks.as_strided
. Here's one way -
from numpy.lib.stride_tricks import as_strided
L = 2 # window length
shp = a.shape
strd = a.strides
out_shp = shp[0],shp[1]-L+1,L
out_strd = strd + (strd[1],)
out = as_strided(a, out_shp, out_strd).reshape(-1,L)
Sample input, output -
In [177]: a
Out[177]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
In [178]: out
Out[178]:
array([[0, 1],
[1, 2],
[2, 3],
[4, 5],
[5, 6],
[6, 7]])
Note that the last step of reshaping forces it to make a copy there. But that's can't be avoided if we need the final output to be a 2D
. If we are okay with a 3D
output, skip that reshape and thus achieve a view
, as shown with the sample case -
In [181]: np.shares_memory(a, out)
Out[181]: False
In [182]: as_strided(a, out_shp, out_strd)
Out[182]:
array([[[0, 1],
[1, 2],
[2, 3]],
[[4, 5],
[5, 6],
[6, 7]]])
In [183]: np.shares_memory(a, as_strided(a, out_shp, out_strd) )
Out[183]: True
How can I divide a numpy array into n sub-arrays using a sliding window of size m?
I think your current method does not produce what you are describing.
Here is a faster method which splits a long array into many sub arrays using list comprehension:
Code Fix:
import numpy as np
x = np.arange(10000)
T = np.array([])
T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])
Speed Comparison:
sample_1 = '''
import numpy as np
x = np.arange(10000)
T = np.array([])
for i in range(len(x)-11):
s = x[i:i+11]
T = np.concatenate((T, s),axis=0)
'''
sample_2 = '''
import numpy as np
x = np.arange(10000)
T = np.array([])
T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])
'''
# Testing the times
import timeit
print(timeit.timeit(sample_1, number=1))
print(timeit.timeit(sample_2, number=1))
Speed Comparison Output:
5.839815437000652 # Your method
0.11047088200211874 # List Comprehension
I only checked 1 iteration as the difference is quite significant and many iterations would not change the overall outcome.
Output Comparison:
# Your method:
[ 0.00000000e+00 1.00000000e+00 2.00000000e+00 ..., 9.99600000e+03
9.99700000e+03 9.99800000e+03]
# Using List Comprehension:
[[ 0 1 2 ..., 8 9 10]
[ 1 2 3 ..., 9 10 11]
[ 2 3 4 ..., 10 11 12]
...,
[9986 9987 9988 ..., 9994 9995 9996]
[9987 9988 9989 ..., 9995 9996 9997]
[9988 9989 9990 ..., 9996 9997 9998]]
You can see that my method actually produces sub-arrays, unlike what your provided code does.
Note:
These tests were carried out on x
which was just a list of ordered numbers from 0 to 10000.
Extract subarrays from 1D array given start indices - Python / NumPy
Use broadcasted
addition to create all those indices and then index -
all_idx = ids[:,None]+range(4) # or np.add.outer(ids, range(4))
out = arr[all_idx]
Using np.lib.stride_tricks.as_strided
based strided_app
-
strided_app(arr, 4, S=1)[ids]
How can I generate an array based on the grouped values of another array in Numpy?
How about this?
from numpy.lib import stride_tricks
Z = np.arange(1,15,dtype=np.uint32)
R = stride_tricks.as_strided(Z,(11,4),(4,4))
print(R)
Output:
[[ 1 2 3 4]
[ 2 3 4 5]
[ 3 4 5 6]
[ 4 5 6 7]
[ 5 6 7 8]
[ 6 7 8 9]
[ 7 8 9 10]
[ 8 9 10 11]
[ 9 10 11 12]
[10 11 12 13]
[11 12 13 14]]
As navneethc righly pointed out this function should be used with caution.
Taking subarrays from numpy array with given stride/stepsize
Approach #1 : Using broadcasting
-
def broadcasting_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]
Approach #2 : Using more efficient NumPy strides
-
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
Sample run -
In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
Is there a fast way to get all neighbor elements in a list?
Use as_strided
from stride_tricks
library
from numpy.lib.stride_tricks import as_strided
n = len(letters) - 1
m = np.array(letters[0]).itemsize
arr = as_strided(letters, shape=(n, 2), strides=(m, m))
Out[289]:
array([['a', 'b'],
['b', 'c'],
['c', 'd'],
['d', 'e'],
['e', 'f'],
['f', 'g'],
['g', 'h'],
['h', 'i'],
['i', 'j'],
['j', 'k']], dtype='<U1')
Related Topics
How to Uninstall Python 2.7 on a MAC Os X 10.6.4
How to Properly Determine the Current Script Directory
Why Is "Except: Pass" a Bad Programming Practice
How to Convert a .Py to .Exe For Python
How to Sort a List of Objects Based on an Attribute of the Objects
Convert Hex String to Integer in Python
C Function Called from Python Via Ctypes Returns Incorrect Value
Using Module 'Subprocess' With Timeout
How to Get Line Count of a Large File Cheaply in Python
How to Schedule Updates (F/E, to Update a Clock) in Tkinter
How to Define a Two-Dimensional Array
Threading Pool Similar to the Multiprocessing Pool
How to Safely Create a Nested Directory
How to Create a List of Random Numbers Without Duplicates