Rolling Window for 1D Arrays in Numpy

Rolling window for 1D arrays in Numpy?

Just use the blog code, but apply your function to the result.

i.e.

numpy.std(rolling_window(observations, n), 1)

where you have (from the blog):

def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

Numpy rolling window over 2D array, as a 1D array with nested array as data values

What was wrong with your as_strided trial? It works for me.

In [28]: x=np.arange(1,11.).reshape(5,2)
In [29]: x.shape
Out[29]: (5, 2)
In [30]: x.strides
Out[30]: (16, 8)
In [31]: np.lib.stride_tricks.as_strided(x,shape=(3,3,2),strides=(16,16,8))
Out[31]:
array([[[ 1., 2.],
[ 3., 4.],
[ 5., 6.]],

[[ 3., 4.],
[ 5., 6.],
[ 7., 8.]],

[[ 5., 6.],
[ 7., 8.],
[ 9., 10.]]])

On my first edit I used an int array, so had to use (8,8,4) for the strides.

Your shape could be wrong. If too large it starts seeing values off the end of the data buffer.

   [[  7.00000000e+000,   8.00000000e+000],
[ 9.00000000e+000, 1.00000000e+001],
[ 8.19968827e-257, 5.30498948e-313]]])

Here it just alters the display method, the 7, 8, 9, 10 are still there. Writing those those slots could be dangerous, messing up other parts of your code. as_strided is best if used for read-only purposes. Writes/sets are trickier.

How to get the rolling(window = 3).max() on a numpy.ndarray?

You can use np.lib.stride_tricks.as_strided:

# a smaller example
import numpy.random as npr
npr.seed(123)
arr = npr.randn(10)
arr[:4] = np.nan

windows = np.lib.stride_tricks.as_strided(arr, shape=(8, 3), strides=(8, 8))

print(windows.max(axis=1))
print(windows.sum(axis=1))
[ nan nan nan nan 1.65143654 1.65143654
1.26593626 1.26593626]
[ nan nan nan nan -1.35384296 -1.20415534
-1.58965561 -0.02971677]

Sliding standard deviation on a 1D NumPy array

You could create a 2D array of sliding windows with np.lib.stride_tricks.as_strided that would be views into the given 1D array and as such won't be occupying any more memory. Then, simply use np.std along the second axis (axis=1) for the final result in a vectorized way, like so -

W = 10 # Window size
nrows = a.size - W + 1
n = a.strides[0]
a2D = np.lib.stride_tricks.as_strided(a,shape=(nrows,W),strides=(n,n))
out = np.std(a2D, axis=1)

Runtime test

Function definitions -

def original_app(a, W):
b = np.empty(a.size-W+1)
for i in range(b.size):
b[i] = np.std(a[i:i+W])
return b

def vectorized_app(a, W):
nrows = a.size - W + 1
n = a.strides[0]
a2D = np.lib.stride_tricks.as_strided(a,shape=(nrows,W),strides=(n,n))
return np.std(a2D,1)

Timings and verification -

In [460]: # Inputs
...: a = np.arange(10000)
...: W = 10
...:

In [461]: np.allclose(original_app(a, W), vectorized_app(a, W))
Out[461]: True

In [462]: %timeit original_app(a, W)
1 loops, best of 3: 522 ms per loop

In [463]: %timeit vectorized_app(a, W)
1000 loops, best of 3: 1.33 ms per loop

So, around 400x speedup there!

For completeness, here's the equivalent pandas version -

import pandas as pd

def pdroll(a, W): # a is 1D ndarray and W is window-size
return pd.Series(a).rolling(W).std(ddof=0).values[W-1:]

Sliding window of M-by-N shape numpy.ndarray

In [1]: import numpy as np

In [2]: a = np.array([[00,01], [10,11], [20,21], [30,31], [40,41], [50,51]])

In [3]: w = np.hstack((a[:-2],a[1:-1],a[2:]))

In [4]: w
Out[4]:
array([[ 0, 1, 10, 11, 20, 21],
[10, 11, 20, 21, 30, 31],
[20, 21, 30, 31, 40, 41],
[30, 31, 40, 41, 50, 51]])

You could write this in as a function as so:

def window_stack(a, stepsize=1, width=3):
n = a.shape[0]
return np.hstack( a[i:1+n+i-width:stepsize] for i in range(0,width) )

This doesn't really depend on the shape of the original array, as long as a.ndim = 2. Note that I never use either lengths in the interactive version. The second dimension of the shape is irrelevant; each row can be as long as you want. Thanks to @Jaime's suggestion, you can do it without checking the shape at all:

def window_stack(a, stepsize=1, width=3):
return np.hstack( a[i:1+i-width or None:stepsize] for i in range(0,width) )

Rolling statistics performance: pandas vs. numpy strides

TL;DR: The two versions use very different algorithms.

The sliding_window_view trick is good to solve the rolling average problem with a small window but this is not a clean way to do that nor an efficient way, especially with a big window. Indeed, Numpy compute a mean and note a rolling average and thus have no clear information that the user is cheating with stride to compute something else. The provided Numpy implementation runs in O(n * w) where n is the array size and w the window size. Pandas does have the information that a rolling average needs to be computed and so it uses a much more efficient algorithm. The Pandas algorithm runs in O(n) time. For more information about it please read this post.

Here is a much faster Numpy implementation:

cumsum = np.cumsum(data)
invSize = 1. / window
(cumsum[window-1:] - np.concatenate([[0], cumsum[:-window]])) * invSize

Here are the performance results on my machine:

Naive Numpy version:  193.2 ms
Pandas version: 33.1 ms
Fast Numpy version: 8.5 ms

How to speed up Numpy array slicing within a for loop?

As RandomGuy suggested, you can use stride_tricks:

np.lib.stride_tricks.as_strided(original,(i_range,k),(8,8))

For larger arrays (and i_range and k) this is probably the most efficient, as it does not allocate any additional memory, there's a drawback - editing the created array would modify the original array as well, unless you make a copy.
The (8,8) parameter define how many bytes in the memory you advance in each direction, I use 8 as its the original array stride size.

Another option, which works better for smaller arrays:

def myfunc2():
i_s = np.arange(i_range).reshape(-1,1)+np.arange(k)
return original[i_s]

This is faster than your original version.
Both, however, are not 100x faster.

Rolling minimum of an array so that first values are minimum of window, not NaN

I think you need DataFrame.bfill

>>> df_min = df_q.rolling(3).min().bfill()
>>> df_min
min_q
0 3.437455
1 3.437455
2 3.437455
3 1.978533
4 -0.504468
5 -0.504468
6 -0.504468
7 -0.766392
8 -0.766392
9 -0.766392


Related Topics



Leave a reply



Submit