Using Strides for an Efficient Moving Average Filter

Using strides for an efficient moving average filter

For what it's worth, here's how you'd do it using "fancy" striding tricks. I was going to post this yesterday, but got distracted by actual work! :)

@Paul & @eat both have nice implementations using various other ways of doing this. Just to continue things from the earlier question, I figured I'd post the N-dimensional equivalent.

You're not going to be able to significantly beat scipy.ndimage functions for >1D arrays, however. (scipy.ndimage.uniform_filter should beat scipy.ndimage.convolve, though)

Moreover, if you're trying to get a multidimensional moving window, you risk having memory usage blow up whenever you inadvertently make a copy of your array. While the initial "rolling" array is just a view into the memory of your original array, any intermediate steps that copy the array will make a copy that is orders of magnitude larger than your original array (i.e. Let's say that you're working with a 100x100 original array... The view into it (for a filter size of (3,3)) will be 98x98x3x3 but use the same memory as the original. However, any copies will use the amount of memory that a full 98x98x3x3 array would!!)

Basically, using crazy striding tricks is great for when you want to vectorize moving window operations on a single axis of an ndarray. It makes it really easy to calculate things like a moving standard deviation, etc with very little overhead. When you want to start doing this along multiple axes, it's possible, but you're usually better off with more specialized functions. (Such as scipy.ndimage, etc)

At any rate, here's how you do it:

import numpy as np

def rolling_window_lastaxis(a, window):
    """Directly taken from Erik Rigtorp's post to numpy-discussion.
    <http://www.mail-archive.com/numpy-discussion@scipy.org/msg29450.html>"""
    if window < 1:
       raise ValueError, "`window` must be at least 1."
    if window > a.shape[-1]:
       raise ValueError, "`window` is too long."
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

def rolling_window(a, window):
    if not hasattr(window, '__iter__'):
        return rolling_window_lastaxis(a, window)
    for i, win in enumerate(window):
        if win > 1:
            a = a.swapaxes(i, -1)
            a = rolling_window_lastaxis(a, win)
            a = a.swapaxes(-2, i)
    return a

filtsize = (3, 3)
a = np.zeros((10,10), dtype=np.float)
a[5:7,5] = 1

b = rolling_window(a, filtsize)
blurred = b.mean(axis=-1).mean(axis=-1)

So what we get when we do b = rolling_window(a, filtsize) is an 8x8x3x3 array, that's actually a view into the same memory as the original 10x10 array. We could have just as easily used different filter size along different axes or operated only along selected axes of an N-dimensional array (i.e. filtsize = (0,3,0,3) on a 4-dimensional array would give us a 6 dimensional view).

We can then apply an arbitrary function to the last axis repeatedly to effectively calculate things in a moving window.

However, because we're storing temporary arrays that are much bigger than our original array on each step of mean (or std or whatever), this is not at all memory efficient! It's also not going to be terribly fast, either.

The equivalent for ndimage is just:

blurred = scipy.ndimage.uniform_filter(a, filtsize, output=a)

This will handle a variety of boundary conditions, do the "blurring" in-place without requiring a temporary copy of the array, and be very fast. Striding tricks are a good way to apply a function to a moving window along one axis, but they're not a good way to do it along multiple axes, usually....

Just my $0.02, at any rate...

Adding a unique value filter to an strides moving windows in Python

Here's one approach with scikit-image's view_as_windows for efficient sliding window extraction.

Steps involved :

Get sliding windows.
Reshape into 2D array. Note that this would make a copy and thus we would lose the efficiency of views, but keep it vectorized.
Sort along the axis of merged block axes.
Get the differentiation along that axes and count the number of different elements, which when added with 1 would be the count of unique values in each of those sliding windows and hence the final expected result.

The implementation would be like so -

from skimage.util import view_as_windows as viewW

def sliding_uniq_count(a, BSZ):
    out_shp = np.asarray(a.shape) - BSZ + 1
    a_slid4D = viewW(a,BSZ)    
    a_slid2D = np.sort(a_slid4D.reshape(-1,np.prod(BSZ)),axis=1)    
    return ((a_slid2D[:,1:] != a_slid2D[:,:-1]).sum(1)+1).reshape(out_shp)

Sample run -

In [233]: a = np.random.randint(0,10,(6,7))

In [234]: a
Out[234]: 
array([[6, 0, 5, 7, 0, 8, 5],
       [3, 0, 7, 1, 5, 4, 8],
       [5, 0, 5, 1, 7, 2, 3],
       [5, 1, 3, 3, 7, 4, 9],
       [9, 0, 7, 4, 9, 1, 1],
       [7, 0, 4, 1, 6, 3, 4]])

In [235]: sliding_uniq_count(a, [3,3])
Out[235]: 
array([[5, 4, 4, 7, 7],
       [5, 5, 4, 6, 7],
       [6, 6, 6, 6, 6],
       [7, 5, 6, 6, 6]])

Hybrid approach

To make it work with very large arrays, to accommodate everything into memory, we might have to keep one loop that would iterate along each row of the input data, like so -

def sliding_uniq_count_oneloop(a, BSZ):
    S = np.prod(BSZ)
    out_shp = np.asarray(a.shape) - BSZ + 1
    a_slid4D = viewW(a,BSZ)    
    out = np.empty(out_shp,dtype=int)
    for i in range(a_slid4D.shape[0]):
        a_slid2D_i = np.sort(a_slid4D[i].reshape(-1,S),-1)
        out[i] = (a_slid2D_i[:,1:] != a_slid2D_i[:,:-1]).sum(-1)+1
    return out

Hybrid approach - Version II

Another version of hybrid one, with the explicit usage of np.lib.stride_tricks.as_strided -

def sliding_uniq_count_oneloop(a, BSZ):
    S = np.prod(BSZ)
    out_shp = np.asarray(a.shape) - BSZ + 1   
    strd = np.lib.stride_tricks.as_strided
    m,n = a.strides
    N = out_shp[1]
    out = np.empty(out_shp,dtype=int)
    for i in range(out_shp[0]):
        a_slid3D = strd(a[i], shape=((N,) + tuple(BSZ)), strides=(n,m,n))
        a_slid2D_i = np.sort(a_slid3D.reshape(-1,S),-1)
        out[i] = (a_slid2D_i[:,1:] != a_slid2D_i[:,:-1]).sum(-1)+1
    return out

How to calculate rolling / moving average using python + NumPy / SciPy?

A simple way to achieve this is by using np.convolve.
The idea behind this is to leverage the way the discrete convolution is computed and use it to return a rolling mean. This can be done by convolving with a sequence of np.ones of a length equal to the sliding window length we want.

In order to do so we could define the following function:

def moving_average(x, w):
    return np.convolve(x, np.ones(w), 'valid') / w

This function will be taking the convolution of the sequence x and a sequence of ones of length w. Note that the chosen mode is valid so that the convolution product is only given for points where the sequences overlap completely.

Some examples:

x = np.array([5,3,8,10,2,1,5,1,0,2])

For a moving average with a window of length 2 we would have:

moving_average(x, 2)
# array([4. , 5.5, 9. , 6. , 1.5, 3. , 3. , 0.5, 1. ])

And for a window of length 4:

moving_average(x, 4)
# array([6.5 , 5.75, 5.25, 4.5 , 2.25, 1.75, 2.  ])

How does `convolve` work?

Lets have a more in depth look at the way the discrete convolution is being computed.
The following function aims to replicate the way np.convolve is computing the output values:

def mov_avg(x, w):
    for m in range(len(x)-(w-1)):
        yield sum(np.ones(w) * x[m:m+w]) / w

Which, for the same example above would also yield:

list(mov_avg(x, 2))
# [4.0, 5.5, 9.0, 6.0, 1.5, 3.0, 3.0, 0.5, 1.0]

So what is being done at each step is to take the inner product between the array of ones and the current window. In this case the multiplication by np.ones(w) is superfluous given that we are directly taking the sum of the sequence.

Bellow is an example of how the first outputs are computed so that it is a little clearer. Lets suppose we want a window of w=4:

[1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*5 + 1*3 + 1*8 + 1*10) / w = 6.5

And the following output would be computed as:

  [1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*3 + 1*8 + 1*10 + 1*2) / w = 5.75

And so on, returning a moving average of the sequence once all overlaps have been performed.

Improving runtime of weighted moving average filter function?

That slicing and summing/averaging on a weighted window basically corresponds to 1D convolution with the kernel being flipped. Now, for 1D convolution, NumPy has a very efficient implementation in np.convolve and that could be used to get rid of the loop and give us y_avg. Thus, we would have a vectorized implementation like so -

y_sums = np.convolve(y,weights[::-1],'valid')
y_avg = np.true_divide(y_sums,weights.sum())

Moving average or running mean

For a short, fast solution that does the whole thing in one loop, without dependencies, the code below works great.

mylist = [1, 2, 3, 4, 5, 6, 7]
N = 3
cumsum, moving_aves = [0], []

for i, x in enumerate(mylist, 1):
    cumsum.append(cumsum[i-1] + x)
    if i>=N:
        moving_ave = (cumsum[i] - cumsum[i-N])/N
        #can do stuff with moving_ave here
        moving_aves.append(moving_ave)

Sliding window using as_strided function in numpy?

Check out the answers to this question: Using strides for an efficient moving average filter. Basically strides are not a great option, although they work.

How to efficiently compute average on the fly (moving average)?

Your solution is essentially the "standard" optimal online solution for keeping a running track of average without storing big sums and also while running "online", i.e. you can just process one number at a time without going back to other numbers, and you only use a constant amount of extra memory. If you want a slightly optimized solution in terms of numerical accuracy, at the cost of being "online", then assuming your numbers are all non-negative, then sort your numbers first from smallest to largest and then process them in that order, the same way you do now. That way, if you get a bunch of numbers that are really small about equal and then you get one big number, you will be able to compute the average accurately without underflow, as opposed to if you processed the large number first.

Windowed/ sliding mean filter accounting for valid members at boundary

Get the windowed summations and divide by the valid members in each window. We can use scipy.signal.convolve2d to get both and hence have a solution like so -

from scipy.signal import convolve2d

def windowed_average(a, kernel_size, mode='same'):
    k = np.ones((kernel_size,kernel_size),dtype=int)
    window_sum = convolve2d(a,k,mode)
    window_count = convolve2d(np.ones(a.shape, dtype=bool),k,mode)
    return window_sum/window_count

Alternative #1

Alternatively, if you want to make use of uniform_filter to get the windowed summations, we can do so and that might be more efficient as well, like so -

from scipy.ndimage import uniform_filter

n = kernel_size**2
window_sum = uniform_filter(a, kernel_size, mode='constant', cval=0.0)*n

Sample runs -

In [54]: a
Out[54]: 
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [55]: windowed_average(a, kernel_size=3)
Out[55]: 
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [56]: b
Out[56]: 
array([[1., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [57]: windowed_average(b, kernel_size=3)
Out[57]: 
array([[0.25      , 0.16666667, 0.        ],
       [0.16666667, 0.11111111, 0.        ],
       [0.        , 0.        , 0.        ]])

Fast way to take average of every N rows in a .npy array

The mean of two values a and b is 0.5*(a+b)
Therefore you can do it like this:

newArray = 0.5*(originalArray[0::2] + originalArray[1::2])

It will sum up all two consecutive rows and in the end multiply every element by 0.5.

Since in the title you are asking for avg over N rows, here is a more general solution:

def groupedAvg(myArray, N=2):
    result = np.cumsum(myArray, 0)[N-1::N]/float(N)
    result[1:] = result[1:] - result[:-1]
    return result

The general form of the average over n elements is sum([x1,x2,...,xn])/n.
The sum of elements m to m+n in vector v is the same as subtracting the m-1th element from the m+nth element of cumsum(v). Unless m is 0, in that case you don't subtract anything (result[0]).
That is what we take advantage of here. Also since everything is linear, it is not important where we divide by N, so we do it right at the beginning, but that is just a matter of taste.

If the last group has less than N elements, it will be ignored completely.
If you don't want to ignore it, you have to treat the last group specially:

def avg(myArray, N=2):
    cum = np.cumsum(myArray,0)
    result = cum[N-1::N]/float(N)
    result[1:] = result[1:] - result[:-1]

    remainder = myArray.shape[0] % N
    if remainder != 0:
        if remainder < myArray.shape[0]:
            lastAvg = (cum[-1]-cum[-1-remainder])/float(remainder)
        else:
            lastAvg = cum[-1]/float(remainder)
        result = np.vstack([result, lastAvg])

    return result

Using Strides for an Efficient Moving Average Filter