What Does -1 Mean in Numpy Reshape

reshaping data in numpy with (-1,1). What does it mean?

reshape(-1) is a line vector, when reshape(-1,1) is a column:

>>> import numpy as np
>>> a = np.linspace(1,6,6).reshape(2,3)
>>> a
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
>>> a.shape
(2, 3)
>>> a.reshape(-1)
array([ 1., 2., 3., 4., 5., 6.])
>>> a.reshape(-1,1)
array([[ 1.],
[ 2.],
[ 3.],
[ 4.],
[ 5.],
[ 6.]])

What does np.reshape(-1) do

It seems that is just returns the flattened array, regardless of the initial shape.

x = np.arange(27).reshape((3,3,3))
y = x.reshape(-1)
y.shape
(27,)
y
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26])
x = np.arange(10)
y = x.reshape(-1)
y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Also, make sure your new shape argument to reshape is a tuple, i.e. a.reshape((2, -1)).

What does -1 in numpy reshape mean?

It means, that the size of the dimension, for which you passed -1, is being inferred. Thus,

A.reshape(-1, 28*28)

means, "reshape A so that its second dimension has a size of 28*28 and calculate the correct size of the first dimension".

See documentation of reshape.

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample

You should reshape your X to be a 2D array not 1D array. Fitting a model requires requires a 2D array. i.e (n_samples, n_features)

x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])

lr = LinearRegression()
lr.fit(x.reshape(-1, 1), y)

print(lr.predict([[2.4]]))

what's the mean of reshape(-1,1,2)

Your question is not entirely clear, so I'm guessing the -1 part is what troubles you.

From the documantaion:

The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.

The whole line meaning is this (breaking it down for simplicity):

  1. points = np.array([x, y]) -> create a 2 X 5 np.array consisting of x,y
  2. .T -> transpose
  3. .reshape(-1, 1, 2) -> reshape it, in this case to a 5X1X2 array (as can seen by the output of points.shape [(5L, 1L, 2L)])

np.reshape(x, (-1,1)) vs x[:, np.newaxis]

Both ways return views of the exact same data, therefore the 'data contiguity' is likely a non-issue as the data is not change, only the view is changed. See Numpy: use reshape or newaxis to add dimensions.

However there might be a practical advantage of using .reshape((-1,1)), as it will reshape the array into 2d array regardless of the original shape. For [:, np.newaxis], the result will depend on the original shape of the array, considering these:

In [3]: a1 = np.array([0, 1, 2])

In [4]: a2 = np.array([[0, 1, 2]])

In [5]: a1.reshape((-1, 1))
Out[5]:
array([[0],
[1],
[2]])

In [6]: a2.reshape((-1, 1))
Out[6]:
array([[0],
[1],
[2]])

In [7]: a1[:, np.newaxis]
Out[7]:
array([[0],
[1],
[2]])

In [8]: a2[:, np.newaxis]
Out[8]: array([[[0, 1, 2]]])

what does np.reshape(2, -1)?

The method reshape, like the name suggests, reshapes a numpy array to the given dimensions. np.arange(10) gives you an array of shape (1, 10). If you use the reshape function, it expects the dimensions (or a tuple containing them) for example (2, 5). However, the -1 means that it will take the right dimension to fit your first argument. In your case, the first dimension is 2, which means the second dimension should be 5 so the reshape function fills this in automatically. That's why your output is of shape (2, 5). You can find all the information you need over here in the documentation.

Difference between numpy.array shape (R, 1) and (R,)

1. The meaning of shapes in NumPy

You write, "I know literally it's list of numbers and list of lists where all list contains only a number" but that's a bit of an unhelpful way to think about it.

The best way to think about NumPy arrays is that they consist of two parts, a data buffer which is just a block of raw elements, and a view which describes how to interpret the data buffer.

For example, if we create an array of 12 integers:

>>> a = numpy.arange(12)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

Then a consists of a data buffer, arranged something like this:

┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and a view which describes how to interpret the data:

>>> a.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
>>> a.dtype
dtype('int64')
>>> a.itemsize
8
>>> a.strides
(8,)
>>> a.shape
(12,)

Here the shape (12,) means the array is indexed by a single index which runs from 0 to 11. Conceptually, if we label this single index i, the array a looks like this:

i= 0    1    2    3    4    5    6    7    8    9   10   11
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

If we reshape an array, this doesn't change the data buffer. Instead, it creates a new view that describes a different way to interpret the data. So after:

>>> b = a.reshape((3, 4))

the array b has the same data buffer as a, but now it is indexed by two indices which run from 0 to 2 and 0 to 3 respectively. If we label the two indices i and j, the array b looks like this:

i= 0    0    0    0    1    1    1    1    2    2    2    2
j= 0 1 2 3 0 1 2 3 0 1 2 3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

which means that:

>>> b[2,1]
9

You can see that the second index changes quickly and the first index changes slowly. If you prefer this to be the other way round, you can specify the order parameter:

>>> c = a.reshape((3, 4), order='F')

which results in an array indexed like this:

i= 0    1    2    0    1    2    0    1    2    0    1    2
j= 0 0 0 1 1 1 2 2 2 3 3 3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

which means that:

>>> c[2,1]
5

It should now be clear what it means for an array to have a shape with one or more dimensions of size 1. After:

>>> d = a.reshape((12, 1))

the array d is indexed by two indices, the first of which runs from 0 to 11, and the second index is always 0:

i= 0    1    2    3    4    5    6    7    8    9   10   11
j= 0 0 0 0 0 0 0 0 0 0 0 0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and so:

>>> d[10,0]
10

A dimension of length 1 is "free" (in some sense), so there's nothing stopping you from going to town:

>>> e = a.reshape((1, 2, 1, 6, 1))

giving an array indexed like this:

i= 0    0    0    0    0    0    0    0    0    0    0    0
j= 0 0 0 0 0 0 1 1 1 1 1 1
k= 0 0 0 0 0 0 0 0 0 0 0 0
l= 0 1 2 3 4 5 0 1 2 3 4 5
m= 0 0 0 0 0 0 0 0 0 0 0 0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and so:

>>> e[0,1,0,0,0]
6

See the NumPy internals documentation for more details about how arrays are implemented.

2. What to do?

Since numpy.reshape just creates a new view, you shouldn't be scared about using it whenever necessary. It's the right tool to use when you want to index an array in a different way.

However, in a long computation it's usually possible to arrange to construct arrays with the "right" shape in the first place, and so minimize the number of reshapes and transposes. But without seeing the actual context that led to the need for a reshape, it's hard to say what should be changed.

The example in your question is:

numpy.dot(M[:,0], numpy.ones((1, R)))

but this is not realistic. First, this expression:

M[:,0].sum()

computes the result more simply. Second, is there really something special about column 0? Perhaps what you actually need is:

M.sum(axis=0)


Related Topics



Leave a reply



Submit