Rolling window for 1D arrays in Numpy?
Just use the blog code, but apply your function to the result.
i.e.
numpy.std(rolling_window(observations, n), 1)
where you have (from the blog):
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
Numpy rolling window over 2D array, as a 1D array with nested array as data values
What was wrong with your as_strided
trial? It works for me.
In [28]: x=np.arange(1,11.).reshape(5,2)
In [29]: x.shape
Out[29]: (5, 2)
In [30]: x.strides
Out[30]: (16, 8)
In [31]: np.lib.stride_tricks.as_strided(x,shape=(3,3,2),strides=(16,16,8))
Out[31]:
array([[[ 1., 2.],
[ 3., 4.],
[ 5., 6.]],
[[ 3., 4.],
[ 5., 6.],
[ 7., 8.]],
[[ 5., 6.],
[ 7., 8.],
[ 9., 10.]]])
On my first edit I used an int
array, so had to use (8,8,4)
for the strides.
Your shape could be wrong. If too large it starts seeing values off the end of the data buffer.
[[ 7.00000000e+000, 8.00000000e+000],
[ 9.00000000e+000, 1.00000000e+001],
[ 8.19968827e-257, 5.30498948e-313]]])
Here it just alters the display method, the 7, 8, 9, 10
are still there. Writing those those slots could be dangerous, messing up other parts of your code. as_strided
is best if used for read-only purposes. Writes/sets are trickier.
How to get the rolling(window = 3).max() on a numpy.ndarray?
You can use np.lib.stride_tricks.as_strided
:
# a smaller example
import numpy.random as npr
npr.seed(123)
arr = npr.randn(10)
arr[:4] = np.nan
windows = np.lib.stride_tricks.as_strided(arr, shape=(8, 3), strides=(8, 8))
print(windows.max(axis=1))
print(windows.sum(axis=1))
[ nan nan nan nan 1.65143654 1.65143654
1.26593626 1.26593626]
[ nan nan nan nan -1.35384296 -1.20415534
-1.58965561 -0.02971677]
Sliding standard deviation on a 1D NumPy array
You could create a 2D array of sliding windows with np.lib.stride_tricks.as_strided
that would be views into the given 1D
array and as such won't be occupying any more memory. Then, simply use np.std
along the second axis (axis=1) for the final result in a vectorized way, like so -
W = 10 # Window size
nrows = a.size - W + 1
n = a.strides[0]
a2D = np.lib.stride_tricks.as_strided(a,shape=(nrows,W),strides=(n,n))
out = np.std(a2D, axis=1)
Runtime test
Function definitions -
def original_app(a, W):
b = np.empty(a.size-W+1)
for i in range(b.size):
b[i] = np.std(a[i:i+W])
return b
def vectorized_app(a, W):
nrows = a.size - W + 1
n = a.strides[0]
a2D = np.lib.stride_tricks.as_strided(a,shape=(nrows,W),strides=(n,n))
return np.std(a2D,1)
Timings and verification -
In [460]: # Inputs
...: a = np.arange(10000)
...: W = 10
...:
In [461]: np.allclose(original_app(a, W), vectorized_app(a, W))
Out[461]: True
In [462]: %timeit original_app(a, W)
1 loops, best of 3: 522 ms per loop
In [463]: %timeit vectorized_app(a, W)
1000 loops, best of 3: 1.33 ms per loop
So, around 400x
speedup there!
For completeness, here's the equivalent pandas version -
import pandas as pd
def pdroll(a, W): # a is 1D ndarray and W is window-size
return pd.Series(a).rolling(W).std(ddof=0).values[W-1:]
Sliding window of M-by-N shape numpy.ndarray
In [1]: import numpy as np
In [2]: a = np.array([[00,01], [10,11], [20,21], [30,31], [40,41], [50,51]])
In [3]: w = np.hstack((a[:-2],a[1:-1],a[2:]))
In [4]: w
Out[4]:
array([[ 0, 1, 10, 11, 20, 21],
[10, 11, 20, 21, 30, 31],
[20, 21, 30, 31, 40, 41],
[30, 31, 40, 41, 50, 51]])
You could write this in as a function as so:
def window_stack(a, stepsize=1, width=3):
n = a.shape[0]
return np.hstack( a[i:1+n+i-width:stepsize] for i in range(0,width) )
This doesn't really depend on the shape of the original array, as long as a.ndim = 2
. Note that I never use either lengths in the interactive version. The second dimension of the shape is irrelevant; each row can be as long as you want. Thanks to @Jaime's suggestion, you can do it without checking the shape at all:
def window_stack(a, stepsize=1, width=3):
return np.hstack( a[i:1+i-width or None:stepsize] for i in range(0,width) )
Rolling statistics performance: pandas vs. numpy strides
TL;DR: The two versions use very different algorithms.
The sliding_window_view
trick is good to solve the rolling average problem with a small window but this is not a clean way to do that nor an efficient way, especially with a big window. Indeed, Numpy compute a mean and note a rolling average and thus have no clear information that the user is cheating with stride to compute something else. The provided Numpy implementation runs in O(n * w)
where n
is the array size and w
the window size. Pandas does have the information that a rolling average needs to be computed and so it uses a much more efficient algorithm. The Pandas algorithm runs in O(n)
time. For more information about it please read this post.
Here is a much faster Numpy implementation:
cumsum = np.cumsum(data)
invSize = 1. / window
(cumsum[window-1:] - np.concatenate([[0], cumsum[:-window]])) * invSize
Here are the performance results on my machine:
Naive Numpy version: 193.2 ms
Pandas version: 33.1 ms
Fast Numpy version: 8.5 ms
How to speed up Numpy array slicing within a for loop?
As RandomGuy suggested, you can use stride_tricks:
np.lib.stride_tricks.as_strided(original,(i_range,k),(8,8))
For larger arrays (and i_range
and k
) this is probably the most efficient, as it does not allocate any additional memory, there's a drawback - editing the created array would modify the original array as well, unless you make a copy.
The (8,8)
parameter define how many bytes in the memory you advance in each direction, I use 8 as its the original array stride size.
Another option, which works better for smaller arrays:
def myfunc2():
i_s = np.arange(i_range).reshape(-1,1)+np.arange(k)
return original[i_s]
This is faster than your original version.
Both, however, are not 100x faster.
Rolling minimum of an array so that first values are minimum of window, not NaN
I think you need DataFrame.bfill
>>> df_min = df_q.rolling(3).min().bfill()
>>> df_min
min_q
0 3.437455
1 3.437455
2 3.437455
3 1.978533
4 -0.504468
5 -0.504468
6 -0.504468
7 -0.766392
8 -0.766392
9 -0.766392
Related Topics
How to Add an Extra Column to a Numpy Array
How to State in Requirements.Txt a Direct Github Source
Main() Function Doesn't Run When Running Script
Rewrite Multiple Lines in the Console
Calculating a Directory's Size Using Python
What Are the 'Levels', 'Keys', and Names Arguments for in Pandas' Concat Function
Writing a Connection String When Password Contains Special Characters
Python Append() VS. + Operator on Lists, Why Do These Give Different Results
Getting List of Parameter Names Inside Python Function
Importerror: No Module Named Requests
Directory-Tree Listing in Python
Pandas Read_Csv: Low_Memory and Dtype Options
How to Open a File Using the Open with Statement
What's the Fastest Way of Checking If a Point Is Inside a Polygon in Python
Comparing Two Numpy Arrays for Equality, Element-Wise
How to Split a String of Space Separated Numbers into Integers