Differencebetween Flatten and Ravel Functions in Numpy

What is the difference between flatten and ravel functions in numpy?

The current API is that:

  • flatten always returns a copy.
  • ravel returns a view of the original array whenever possible. This isn't visible in the printed output, but if you modify the array returned by ravel, it may modify the entries in the original array. If you modify the entries in an array returned from flatten this will never happen. ravel will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.
  • reshape((-1,)) gets a view whenever the strides of the array allow it even if that means you don't always get a contiguous array.

What is the difference between flatten and ravel in numpy?

Aha:
The primary functional difference is thatflatten is a method of an ndarray object and hence can only be called for true numpy arrays. In contrast ravel() is a library-level function and hence can be called on any object that can successfully be parsed. For example ravel() will work on a list of ndarrays, while flatten (obviously) won't.

In addition, as @jonrsharpe pointed out in his comment, the flatten method always returns a copy, while ravel only does so "if needed." Still not quite sure how this determination is made.

numpy difference between flat and ravel()

flat is an iterator. It is a separate object that just happens to give access to the array elements via indexing. Its main purpose is to be used in loops and comprehension expressions. The order it gives is the same as the one you would generally get from ravel.

Unlike the result of ravel, flat is not an ndarray, so it can not do much besides indexing the array and iterating over it. Notice that you had to call list to view the contents of the iterator. For example, arr.flat.min() would fail with an AttributeError, while arr.ravel().min() would give the same result as arr.min().

Since numpy provides so many operations that do not require explicit loops to be written, ndarray.flat, and iterators in general, are rarely used compared to ndarray.ravel().

That being said, there are situations where an iterator is preferable. If your array is large enough and you are trying to inspect all the elements one-by-one, an iterator would work well. This is especially true if you have something like a memory-mapped array that gets loaded in portions.

Differences between X.ravel() and X.reshape(s0*s1*s2) when number of axes known

Look at their __array_interface__ and do some timings. The only difference that I can see is that ravel is faster.

.flatten() has a more significant difference - it returns a copy.

A.reshape(-1)

is a simpler way to use reshape.

You could study the respective docs, and see if there is something else. I haven't explored what happens when you specify order.

I would use ravel if I just want it to be 1d. I use .reshape most often to change a 1d (e.g. arange()) to nd.

e.g.

np.arange(10).reshape(2,5).ravel()

Or choose the one that makes your code most readable.


reshape and ravel are defined in numpy C code:

In https://github.com/numpy/numpy/blob/0703f55f4db7a87c5a9e02d5165309994b9b13fd/numpy/core/src/multiarray/shape.c

PyArray_Ravel(PyArrayObject *arr, NPY_ORDER order) requires nearly 100 lines of C code. And it punts to PyArray_Flatten if the order changes.

In the same file, reshape punts to newshape. That in turn returns a view is the shape doesn't actually change, tries _attempt_nocopy_reshape, and as last resort returns a PyArray_NewCopy.

Both make use of PyArray_Newshape and PyArray_NewFromDescr - depending on how shapes and order mix and match.

So identifying where reshape (to 1d) and ravel are different would require careful study.


Another way to do this ravel is to make a new array, with a new shape, but the same data buffer:

np.ndarray((24,),buffer=A.data)

It times the same as reshape. Its __array_interface__ is the same. I don't recommend using this method, but it may clarify what is going on with these reshape/ravel functions. They all make a new array, with new shape, but with share data (if possible). Timing differences are the result of different sequences of function calls - in Python and C - not in different handling of the data.

Python numpy ravel function not flattening array

In [455]: x = np.array([np.array(['0 <= ... < 200 DM', '< 0 DM', 'no checking account'], dtype=object),
...:
...: np.array(['critical account/ other credits existing (not at this bank)',
...: 'existing credits paid back duly till now'], dtype=object),
...: np.array(['(vacation - does not exist?)', 'domestic appliances'],
...: dtype=object)], dtype=object)
In [456]: x
Out[456]:
array([array(['0 <= ... < 200 DM', '< 0 DM', 'no checking account'], dtype=object),
array(['critical account/ other credits existing (not at this bank)',
'existing credits paid back duly till now'], dtype=object),
array(['(vacation - does not exist?)', 'domestic appliances'],
dtype=object)], dtype=object)
In [457]: x.shape
Out[457]: (3,)
In [458]: [i.shape for i in x]
Out[458]: [(3,), (2,), (2,)]

x is a 1d array with 3 elements. Those elements are themselves arrays, with differing shape.

One way to flatten it is:

In [459]: np.hstack(x)                                                                                 
Out[459]:
array(['0 <= ... < 200 DM', '< 0 DM', 'no checking account',
'critical account/ other credits existing (not at this bank)',
'existing credits paid back duly till now',
'(vacation - does not exist?)', 'domestic appliances'],
dtype=object)
In [460]: _.shape
Out[460]: (7,)

How do numpy authors decide whether to put a function in numpy.* vs. numpy.ndarray.*?

In [495]: x = np.arange(12).reshape(3,4)      # reshape((3,4)) also                                                         
In [496]: x.flatten?
Docstring:
a.flatten(order='C')
Return a copy of the array collapsed into one dimension.

ravel method and function are "equivalent":

In [497]: x.ravel?                                                                                     
Docstring:
a.ravel([order])
Return a flattened array.
Signature: np.ravel(a, order='C')
Docstring:
Return a contiguous flattened array.

A 1-D array, containing the elements of the input, is returned. A copy is

made only if needed.

By your terminology, flatten is out-of-place, ravel is not. Or in numpy's terms, ravel usually produces a view, rather than a copy.

The actual code for np.ravel is:

if isinstance(a, np.matrix):
return asarray(a).ravel(order=order)
else:
return asanyarray(a).ravel(order=order)

If the argument is not an array, it is turned into one. Then the method is used.

This pattern is quite common. The function does an asarray if needed, and then delegates the action to the method.

np.reshape and x.reshape follow this pattern. There is a x.shape=... form that is a real in-place action. They return a view where possible (they don't change the total number of elements). This view shares data, but has its own shape and strides.

resize is one of the function/method pairs that has significant differences between the two. We don't use it much.

The repeat function is the same as the method. Because it normally changes the number of elements, repeat (both forms) returns a new array, with its own data. It does not return a view.

sum is another pair that returns a new array. It changes the number of elements, so a view isn't possible.

As for randn, it's docs explains the difference. Specifying shape as tuple might well the preferred 'standard', but this randn behavior is unusual. The suggested alternative for new code standard_normal takes the size tuple. reshape accepts either.

While the normal tuple syntax is (1,2,3), the () are actual optional; it's the comma that marks the tuple. It's required in a 1 element tuple, eg. (1,). In indexing x[(1,2)] and x[1,2] are the same, passing a tuple to x.__getitem__.

Both python and numpy have long histories. Choices made in the past are still with us in one way or other now. Refining the code is slow; adding features is easier than removing them.



Related Topics



Leave a reply



Submit