Efficient Numpy 2D array construction from 1D array
Actually, there's an even more efficient way to do this... The downside to using vstack
etc, is that you're making a copy of the array.
Incidentally, this is effectively identical to @Paul's answer, but I'm posting this just to explain things in a bit more detail...
There's a way to do this with just views so that no memory is duplicated.
I'm directly borrowing this from Erik Rigtorp's post to numpy-discussion, who in turn, borrowed it from Keith Goodman's Bottleneck (Which is quite useful!).
The basic trick is to directly manipulate the strides of the array (For one-dimensional arrays):
import numpy as np
def rolling(a, window):
shape = (a.size - window + 1, window)
strides = (a.itemsize, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = np.arange(10)
print rolling(a, 3)
Where a
is your input array and window
is the length of the window that you want (3, in your case).
This yields:
[[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]
[5 6 7]
[6 7 8]
[7 8 9]]
However, there is absolutely no duplication of memory between the original a
and the returned array. This means that it's fast and scales much better than other options.
For example (using a = np.arange(100000)
and window=3
):
%timeit np.vstack([a[i:i-window] for i in xrange(window)]).T
1000 loops, best of 3: 256 us per loop
%timeit rolling(a, window)
100000 loops, best of 3: 12 us per loop
If we generalize this to a "rolling window" along the last axis for an N-dimensional array, we get Erik Rigtorp's "rolling window" function:
import numpy as np
def rolling_window(a, window):
"""
Make an ndarray with a rolling window of the last dimension
Parameters
----------
a : array_like
Array to add rolling window to
window : int
Size of rolling window
Returns
-------
Array that is a view of the original array with a added dimension
of size w.
Examples
--------
>>> x=np.arange(10).reshape((2,5))
>>> rolling_window(x, 3)
array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
[[5, 6, 7], [6, 7, 8], [7, 8, 9]]])
Calculate rolling mean of last dimension:
>>> np.mean(rolling_window(x, 3), -1)
array([[ 1., 2., 3.],
[ 6., 7., 8.]])
"""
if window < 1:
raise ValueError, "`window` must be at least 1."
if window > a.shape[-1]:
raise ValueError, "`window` is too long."
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
So, let's look into what's going on here... Manipulating an array's strides
may seem a bit magical, but once you understand what's going on, it's not at all. The strides of a numpy array describe the size in bytes of the steps that must be taken to increment one value along a given axis. So, in the case of a 1-dimensional array of 64-bit floats, the length of each item is 8 bytes, and x.strides
is (8,)
.
x = np.arange(9)
print x.strides
Now, if we reshape this into a 2D, 3x3 array, the strides will be (3 * 8, 8)
, as we would have to jump 24 bytes to increment one step along the first axis, and 8 bytes to increment one step along the second axis.
y = x.reshape(3,3)
print y.strides
Similarly a transpose is the same as just reversing the strides of an array:
print y
y.strides = y.strides[::-1]
print y
Clearly, the strides of an array and the shape of an array are intimately linked. If we change one, we have to change the other accordingly, otherwise we won't have a valid description of the memory buffer that actually holds the values of the array.
Therefore, if you want to change both the shape and size of an array simultaneously, you can't do it just by setting x.strides
and x.shape
, even if the new strides and shape are compatible.
That's where numpy.lib.as_strided
comes in. It's actually a very simple function that just sets the strides and shape of an array simultaneously.
It checks that the two are compatible, but not that the old strides and new shape are compatible, as would happen if you set the two independently. (It actually does this through numpy's __array_interface__
, which allows arbitrary classes to describe a memory buffer as a numpy array.)
So, all we've done is made it so that steps one item forward (8 bytes in the case of a 64-bit array) along one axis, but also only steps 8 bytes forward along the other axis.
In other words, in case of a "window" size of 3, the array has a shape of (whatever, 3)
, but instead of stepping a full 3 * x.itemsize
for the second dimension, it only steps one item forward, effectively making the rows of new array a "moving window" view into the original array.
(This also means that x.shape[0] * x.shape[1]
will not be the same as x.size
for your new array.)
At any rate, hopefully that makes things slightly clearer..
2D array Of All Cyclic Shifts Of A 1D array
you are actually building a circulant matrix. Just use the scipy circulant
function. Be careful, because you must pass in the first vertical column, not first row:
from scipy.linalg import circulant
circulant([1,4,3,2]
> array([[1, 2, 3, 4],
[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1]]
For reference, circulant matrices have very very nice properties.
Create a two-dimensional array with two one-dimensional arrays
If you wish to combine two 10 element one-dimensional arrays into a two-dimensional array, np.vstack((tp, fp)).T
will do it.
np.vstack((tp, fp))
will return an array of shape (2, 10), and the T
attribute returns the transposed array with shape (10, 2) (i.e., with the two one-dimensional arrays forming columns rather than rows).
>>> tp = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> tp.ndim
1
>>> tp.shape
(10,)
>>> fp = np.array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> fp.ndim
1
>>> fp.shape
(10,)
>>> combined = np.vstack((tp, fp)).T
>>> combined
array([[ 0, 10],
[ 1, 11],
[ 2, 12],
[ 3, 13],
[ 4, 14],
[ 5, 15],
[ 6, 16],
[ 7, 17],
[ 8, 18],
[ 9, 19]])
>>> combined.ndim
2
>>> combined.shape
(10, 2)
Related Topics
How to Use Python to Execute a Curl Command
Getting a MAChine's External Ip Address with Python
Elegant Way to Check If a Nested Key Exists in a Dict
How to Display a Pandas Data Frame with Pyqt5/Pyside2
Pytz Localize VS Datetime Replace
How to Change Dataframe Column Names in Pyspark
Listing Available Com Ports with Python
Python Requests - How to Use System Ca-Certificates (Debian/Ubuntu)
What Is the Most Efficient Way to Get First and Last Line of a Text File
Use Python's String.Replace VS Re.Sub
Best Way to Format Integer as String with Leading Zeros
Pipe Subprocess Standard Output to a Variable
How to Stop Flask from Initialising Twice in Debug Mode
Should I Call Close() After Urllib.Urlopen()
How to Transpose Dataframe in Pandas Without Index
Why Is Tensorflow 2 Much Slower Than Tensorflow 1