What is the difference between flatten and ravel functions in numpy?
The current API is that:
flatten
always returns a copy.ravel
returns a view of the original array whenever possible. This isn't visible in the printed output, but if you modify the array returned by ravel, it may modify the entries in the original array. If you modify the entries in an array returned from flatten this will never happen. ravel will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.reshape((-1,))
gets a view whenever the strides of the array allow it even if that means you don't always get a contiguous array.
What is the difference between flatten and ravel in numpy?
Aha:
The primary functional difference is thatflatten
is a method of an ndarray object and hence can only be called for true numpy arrays. In contrast ravel()
is a library-level function and hence can be called on any object that can successfully be parsed. For example ravel()
will work on a list of ndarrays, while flatten (obviously) won't.
In addition, as @jonrsharpe pointed out in his comment, the flatten method always returns a copy, while ravel only does so "if needed." Still not quite sure how this determination is made.
numpy difference between flat and ravel()
flat
is an iterator. It is a separate object that just happens to give access to the array elements via indexing. Its main purpose is to be used in loops and comprehension expressions. The order it gives is the same as the one you would generally get from ravel
.
Unlike the result of ravel
, flat
is not an ndarray
, so it can not do much besides indexing the array and iterating over it. Notice that you had to call list
to view the contents of the iterator. For example, arr.flat.min()
would fail with an AttributeError
, while arr.ravel().min()
would give the same result as arr.min()
.
Since numpy
provides so many operations that do not require explicit loops to be written, ndarray.flat
, and iterators in general, are rarely used compared to ndarray.ravel()
.
That being said, there are situations where an iterator is preferable. If your array is large enough and you are trying to inspect all the elements one-by-one, an iterator would work well. This is especially true if you have something like a memory-mapped array that gets loaded in portions.
Differences between X.ravel() and X.reshape(s0*s1*s2) when number of axes known
Look at their __array_interface__
and do some timings. The only difference that I can see is that ravel
is faster.
.flatten()
has a more significant difference - it returns a copy.
A.reshape(-1)
is a simpler way to use reshape.
You could study the respective docs, and see if there is something else. I haven't explored what happens when you specify order
.
I would use ravel
if I just want it to be 1d. I use .reshape
most often to change a 1d (e.g. arange()
) to nd.
e.g.
np.arange(10).reshape(2,5).ravel()
Or choose the one that makes your code most readable.
reshape
and ravel
are defined in numpy
C code:
In https://github.com/numpy/numpy/blob/0703f55f4db7a87c5a9e02d5165309994b9b13fd/numpy/core/src/multiarray/shape.c
PyArray_Ravel(PyArrayObject *arr, NPY_ORDER order)
requires nearly 100 lines of C code. And it punts to PyArray_Flatten
if the order changes.
In the same file, reshape
punts to newshape
. That in turn returns a view
is the shape doesn't actually change, tries _attempt_nocopy_reshape
, and as last resort returns a PyArray_NewCopy
.
Both make use of PyArray_Newshape
and PyArray_NewFromDescr
- depending on how shapes and order mix and match.
So identifying where reshape (to 1d) and ravel are different would require careful study.
Another way to do this ravel is to make a new array, with a new shape, but the same data buffer:
np.ndarray((24,),buffer=A.data)
It times the same as reshape
. Its __array_interface__
is the same. I don't recommend using this method, but it may clarify what is going on with these reshape/ravel functions. They all make a new array, with new shape, but with share data (if possible). Timing differences are the result of different sequences of function calls - in Python and C - not in different handling of the data.
Python numpy ravel function not flattening array
In [455]: x = np.array([np.array(['0 <= ... < 200 DM', '< 0 DM', 'no checking account'], dtype=object),
...:
...: np.array(['critical account/ other credits existing (not at this bank)',
...: 'existing credits paid back duly till now'], dtype=object),
...: np.array(['(vacation - does not exist?)', 'domestic appliances'],
...: dtype=object)], dtype=object)
In [456]: x
Out[456]:
array([array(['0 <= ... < 200 DM', '< 0 DM', 'no checking account'], dtype=object),
array(['critical account/ other credits existing (not at this bank)',
'existing credits paid back duly till now'], dtype=object),
array(['(vacation - does not exist?)', 'domestic appliances'],
dtype=object)], dtype=object)
In [457]: x.shape
Out[457]: (3,)
In [458]: [i.shape for i in x]
Out[458]: [(3,), (2,), (2,)]
x
is a 1d array with 3 elements. Those elements are themselves arrays, with differing shape.
One way to flatten it is:
In [459]: np.hstack(x)
Out[459]:
array(['0 <= ... < 200 DM', '< 0 DM', 'no checking account',
'critical account/ other credits existing (not at this bank)',
'existing credits paid back duly till now',
'(vacation - does not exist?)', 'domestic appliances'],
dtype=object)
In [460]: _.shape
Out[460]: (7,)
How do numpy authors decide whether to put a function in numpy.* vs. numpy.ndarray.*?
In [495]: x = np.arange(12).reshape(3,4) # reshape((3,4)) also
In [496]: x.flatten?
Docstring:
a.flatten(order='C')
Return a copy of the array collapsed into one dimension.
ravel
method and function are "equivalent":
In [497]: x.ravel?
Docstring:
a.ravel([order])
Return a flattened array.
Signature: np.ravel(a, order='C')
Docstring:
Return a contiguous flattened array.
A 1-D array, containing the elements of the input, is returned. A copy is
made only if needed.
By your terminology, flatten
is out-of-place
, ravel
is not. Or in numpy's
terms, ravel
usually produces a view
, rather than a copy
.
The actual code for np.ravel
is:
if isinstance(a, np.matrix):
return asarray(a).ravel(order=order)
else:
return asanyarray(a).ravel(order=order)
If the argument is not an array, it is turned into one. Then the method is used.
This pattern is quite common. The function does an asarray
if needed, and then delegates the action to the method.
np.reshape
and x.reshape
follow this pattern. There is a x.shape=...
form that is a real in-place
action. They return a view
where possible (they don't change the total number of elements). This view
shares data, but has its own shape
and strides
.
resize
is one of the function/method pairs that has significant differences between the two. We don't use it much.
The repeat
function is the same as the method. Because it normally changes the number of elements, repeat
(both forms) returns a new array, with its own data. It does not return a view
.
sum
is another pair that returns a new array. It changes the number of elements, so a view
isn't possible.
As for randn
, it's docs explains the difference. Specifying shape as tuple might well the preferred 'standard', but this randn
behavior is unusual. The suggested alternative for new code standard_normal
takes the size
tuple. reshape
accepts either.
While the normal tuple syntax is (1,2,3)
, the ()
are actual optional; it's the comma that marks the tuple. It's required in a 1 element tuple, eg. (1,)
. In indexing x[(1,2)]
and x[1,2]
are the same, passing a tuple
to x.__getitem__
.
Both python and numpy
have long histories. Choices made in the past are still with us in one way or other now. Refining the code is slow; adding features is easier than removing them.
Related Topics
Extract Text from Xml Documents in Python
Importerror: Matplotlib Is Required for Plotting When the Default Backend "Matplotlib" Is Selected
Basic Python Hello World Program Syntax Error
Differencebetween Installing a Package Using Pip VS. Apt-Get
How to Directly Send a Python Output to Clipboard
Run a Linux System Command as a Superuser, Using a Python Script
R and Python in One Jupyter Notebook
R Expand.Grid() Function in Python
Conda Reports Packagesnotfounderror: Python=3.1 for Reticulate Environment
Placing Custom Images in a Plot Window--As Custom Data Markers or to Annotate Those Markers
Why Are Scripting Languages (E.G. Perl, Python, and Ruby) Not Suitable as Shell Languages
Parallel Processing from a Command Queue on Linux (Bash, Python, Ruby... Whatever)
Does SQLalchemy Have an Equivalent of Django's Get_Or_Create
Why Can Tuples Contain Mutable Items
Safe Method to Get Value of Nested Dictionary
Get a Filtered List of Files in a Directory