How to Understand Numpy Strides for Layman

How to understand numpy strides for layman?

The actual data of a numpy array is stored in a homogeneous and contiguous block of memory called data buffer. For more information see NumPy internals.
Using the (default) row-major order, a 2D array looks like this:

Sample Image

To map the indices i,j,k,... of a multidimensional array to the positions in the data buffer (the offset, in bytes), NumPy uses the notion of strides.
Strides are the number of bytes to jump-over in the memory in order to get from one item to the next item along each direction/dimension of the array. In other words, it's the byte-separation between consecutive items for each dimension.

For example:

>>> a = np.arange(1,10).reshape(3,3)
>>> a
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

This 2D array has two directions, axes-0 (running vertically downwards across rows), and axis-1 (running horizontally across columns), with each item having size:

>>> a.itemsize  # in bytes
4

So to go from a[0, 0] -> a[0, 1] (moving horizontally along the 0th row, from the 0th column to the 1st column) the byte-step in the data buffer is 4. Same for a[0, 1] -> a[0, 2], a[1, 0] -> a[1, 1] etc. This means that the number of strides for the horizontal direction (axis-1) is 4 bytes.

However, to go from a[0, 0] -> a[1, 0] (moving vertically along the 0th column, from the 0th row to the 1st row), you need first to traverse all the remaining items on the 0th row to get to the 1st row, and then move through the 1st row to get to the item a[1, 0], i.e. a[0, 0] -> a[0, 1] -> a[0, 2] -> a[1, 0]. Therefore the number of strides for the vertical direction (axis-0) is 3*4 = 12 bytes. Note that going from a[0, 2] -> a[1, 0], and in general from the last item of the i-th row to the first item of the (i+1)-th row, is also 4 bytes because the array a is stored in the row-major order.

That's why

>>> a.strides  # (strides[0], strides[1])
(12, 4)

Here's another example showing that the strides in the horizontal direction (axis-1), strides[1], of a 2D array is not necessary equal to the item size (e.g. an array with column-major order):

>>> b = np.array([[1, 4, 7],
                  [2, 5, 8],
                  [3, 6, 9]]).T
>>> b
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

>>> b.strides
(4, 12)

Here strides[1] is a multiple of the item-size. Although the array b looks identical to the array a, it's a different array: internally b is stored as |1|4|7|2|5|8|3|6|9| (because transposing doesn't affect the data buffer but only swaps the strides and the shape), whereas a as |1|2|3|4|5|6|7|8|9|. What makes them look alike is the different strides. That is, the byte-step for b[0, 0] -> b[0, 1] is 3*4=12 bytes and for b[0, 0] -> b[1, 0] is 4 bytes, whereas for a[0, 0] -> a[0, 1] is 4 bytes and for a[0, 0] -> a[1, 0] is 12 bytes.

Last but not least, NumPy allows to create views of existing arrays with the option of modifying the strides and the shape, see stride tricks. For example:

>>> np.lib.stride_tricks.as_strided(a, shape=a.shape[::-1], strides=a.strides[::-1])
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

which is equivalent to transposing the array a.

Let me just add, but without going into much detail, that one can even define strides that are not multiples of the item size. Here's an example:

>>> a = np.lib.stride_tricks.as_strided(np.array([1, 512, 0, 3], dtype=np.int16), 
                                        shape=(3,), strides=(3,))
>>> a
array([1, 2, 3], dtype=int16)

>>> a.strides[0]
3

>>> a.itemsize
2

Confusion about numpy strides

The little-endian memory layout of np.array([1, 512, 0, 3], dtype=np.int16) actually looks like this in memory (due to being little-endian, the individual entry bytes are actually in the reverse order from how you would write them):

(value)
-----------------------------------------------------------------------------------------
         1           |        512           |         0            |          3
-----------------------------------------------------------------------------------------
0000 0001  0000 0000 | 0000 0000  0000 0010 | 0000 0000  0000 0000 | 0000 0011  0000 0000
-----------------------------------------------------------------------------------------
        0          1           2          3           4          5           6          7
(byte number)

stride=3 means to move 3 bytes between items, so you'll get byte numbers 0-1, 3-4, 6-7.

These are 0000 0001 0000 0000, 0000 0010 0000 0000, 0000 0011 0000 0000, again interpreted as little-endian.

The strides of an array tell us how many bytes we have to skip in memory to move to the next position along a certain axis.

https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.strides.html

Can numpy strides stride only within subarrays?

Sure, that's possible with np.lib.stride_tricks.as_strided. Here's one way -

from numpy.lib.stride_tricks import as_strided

L = 2 # window length
shp = a.shape
strd = a.strides

out_shp = shp[0],shp[1]-L+1,L
out_strd = strd + (strd[1],)

out = as_strided(a, out_shp, out_strd).reshape(-1,L)

Sample input, output -

In [177]: a
Out[177]: 
array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [178]: out
Out[178]: 
array([[0, 1],
       [1, 2],
       [2, 3],
       [4, 5],
       [5, 6],
       [6, 7]])

Note that the last step of reshaping forces it to make a copy there. But that's can't be avoided if we need the final output to be a 2D. If we are okay with a 3D output, skip that reshape and thus achieve a view, as shown with the sample case -

In [181]: np.shares_memory(a, out)
Out[181]: False

In [182]: as_strided(a, out_shp, out_strd)
Out[182]: 
array([[[0, 1],
        [1, 2],
        [2, 3]],

       [[4, 5],
        [5, 6],
        [6, 7]]])

In [183]: np.shares_memory(a, as_strided(a, out_shp, out_strd) )
Out[183]: True

padding numpy strides on left and right side

You always pad L - 1 elements. This is too much in must cases, but in case of right-padding it does not matter because superfluous elements of the end of the array are ignored. However, in case of left padding this is noticable because all elements from the beginning end up in the array.

In the particular case you insert 29 elements but you need only 28, so the last item in the array is dropped from the sliding window.

The number of elements to insert should depend on both array size and window size like this:

L - (a.shape[0] - 1) % L - 1

This will evaluate to the number of elements required to make the array size an integer multiple of the window size (which can be 0 if they already match).

reshaping numpy array into small subslices

You can use numpy stride tricks (numpy.lib.stride_tricks.as_strided) to create a new view of the array. This will be faster than any list comprehension because no data are copied. The IPython Cookbook has more examples of using stride tricks.

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
bytes_per_item = a.dtype.itemsize
b = np.lib.stride_tricks.as_strided(
    a, shape=(8, 3), strides=(bytes_per_item, bytes_per_item))

array([[ 1,  2,  3],
       [ 2,  3,  4],
       [ 3,  4,  5],
       [ 4,  5,  6],
       [ 5,  6,  7],
       [ 6,  7,  8],
       [ 7,  8,  9],
       [ 8,  9, 10]])

Timed tests

This answer is orders of magnitude faster than answers here that use loops. Find the tests below (done in Jupyter Notebook with %timeit magic). Note that one of the functions does not work properly with numpy arrays and requires a Python list.

Setup

import numpy as np

a = np.arange(1, 100001, dtype=np.int64)
a_list = a.tolist()

def jakub(a, shape):
    a = np.asarray(a)
    bytes_per_item = a.dtype.itemsize
    # The docs for this function recommend setting `writeable=False` to
    # prevent modifying the underlying array.
    return np.lib.stride_tricks.as_strided(
        a, shape=shape, strides=(bytes_per_item, bytes_per_item), writeable=False)

# https://stackoverflow.com/a/63426256/5666087
def daveldito(arr):
    return np.array([arr[each:each+2]+[arr[each+2]] for each in range(len(arr)-2)])

# https://stackoverflow.com/a/63426205/5666087
def akshay_sehgal(a):
    return np.array([i for i in zip(a,a[1:],a[2:])])

Results

%timeit jakub(a, shape=(a.shape[0]-2, 3))
8.85 µs ± 425 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit daveldito(a_list)
141 ms ± 8.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit akshay_sehgal(a)
168 ms ± 9.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Einsum formula for repeating dimensions

It is not possible to get this result using np.einsum() alone, but you can try this:

import numpy as np
from numpy.lib.stride_tricks import as_strided

m, n, o = 2, 3, 5
np.random.seed(0)
other = np.random.rand(m, n, o)
prev = np.random.rand(m, n, o, m, n, o)
mu = np.zeros((m, n, o, m, n, o))

mu_view = as_strided(mu,
                     shape=(m, n, o),
                     strides=[sum(mu.strides[i::3]) for i in range(3)]
                     )
np.einsum('cijcij,cij->cij', prev, other, out=mu_view)

The array mu should be then the same as the one produced by the code using nested loops in the question.

Some explanation. Regardless of a shape of a numpy array, internally its elements are stored in a contiguous block of memory. Part of the structure of an array are strides, which specify how many bytes one needs to jump when one of the indices of an array element is incremented by 1. Thus, in a 2-dimensional array arr, arr.stride[0] is the number of bytes separating an element arr[i, j] from arr[i+1, j] and arr.stride[1] is the number of bytes separating arr[i, j] from a[i, j+1]. Using the strides information numpy can find a given element in an array based on its indices. See e.g. this post for more details.

numpy.lib.stride_tricks.as_strided is a function that creates a view of a given array with custom-made strides. By specifying strides, one can change which array element corresponds to which indices. In the code above this is used to create mu_view, which is a view of mu with the property, that the element mu_view[c, i, j] is the element mu[c, i, j, c, i, j]. This is done by specifying strides of mu_view in terms of strides of mu. For example, the distance between mu_view[c, i, j] and mu_view[c+1, i, j] is set to be the distance between mu[c, i, j, c, i, j] and mu[c+1, i, j, c+1, i, j], which is mu.strides[0] + mu.strides[3].

How does numpy reshaping of a slice works

I like to look at a.__array_interface__ which includes a databuffer 'pointer'.. From that you can see the c offset. I suspect d is a copy, no longer a view. This should be obvious from interface.

In [155]: a=np.arange(20)
In [156]: a.__array_interface__
Out[156]: 
{'data': (34619344, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (20,),
 'version': 3}
In [157]: b = a.reshape(5,4)
In [158]: b.__array_interface__
Out[158]: 
{'data': (34619344, False),       # same
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (5, 4),
 'version': 3}
In [159]: c = b[1:,1:]
In [160]: c.__array_interface__
Out[160]: 
{'data': (34619384, False),       # offset by 32+8
 'strides': (32, 8),
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (4, 3),
 'version': 3}
In [161]: d=c.reshape(3,4)
In [162]: d.__array_interface__
Out[162]: 
{'data': (36537904, False),       # totally different
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (3, 4),
 'version': 3}

How to Understand Numpy Strides for Layman