Difference Between Numpy.Array Shape (R, 1) and (R,)

Difference between numpy.array shape (R, 1) and (R,)

1. The meaning of shapes in NumPy

You write, "I know literally it's list of numbers and list of lists where all list contains only a number" but that's a bit of an unhelpful way to think about it.

The best way to think about NumPy arrays is that they consist of two parts, a data buffer which is just a block of raw elements, and a view which describes how to interpret the data buffer.

For example, if we create an array of 12 integers:

>>> a = numpy.arange(12)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

Then a consists of a data buffer, arranged something like this:

┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and a view which describes how to interpret the data:

>>> a.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
>>> a.dtype
dtype('int64')
>>> a.itemsize
8
>>> a.strides
(8,)
>>> a.shape
(12,)

Here the shape (12,) means the array is indexed by a single index which runs from 0 to 11. Conceptually, if we label this single index i, the array a looks like this:

i= 0    1    2    3    4    5    6    7    8    9   10   11
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

If we reshape an array, this doesn't change the data buffer. Instead, it creates a new view that describes a different way to interpret the data. So after:

>>> b = a.reshape((3, 4))

the array b has the same data buffer as a, but now it is indexed by two indices which run from 0 to 2 and 0 to 3 respectively. If we label the two indices i and j, the array b looks like this:

i= 0    0    0    0    1    1    1    1    2    2    2    2
j= 0 1 2 3 0 1 2 3 0 1 2 3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

which means that:

>>> b[2,1]
9

You can see that the second index changes quickly and the first index changes slowly. If you prefer this to be the other way round, you can specify the order parameter:

>>> c = a.reshape((3, 4), order='F')

which results in an array indexed like this:

i= 0    1    2    0    1    2    0    1    2    0    1    2
j= 0 0 0 1 1 1 2 2 2 3 3 3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

which means that:

>>> c[2,1]
5

It should now be clear what it means for an array to have a shape with one or more dimensions of size 1. After:

>>> d = a.reshape((12, 1))

the array d is indexed by two indices, the first of which runs from 0 to 11, and the second index is always 0:

i= 0    1    2    3    4    5    6    7    8    9   10   11
j= 0 0 0 0 0 0 0 0 0 0 0 0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and so:

>>> d[10,0]
10

A dimension of length 1 is "free" (in some sense), so there's nothing stopping you from going to town:

>>> e = a.reshape((1, 2, 1, 6, 1))

giving an array indexed like this:

i= 0    0    0    0    0    0    0    0    0    0    0    0
j= 0 0 0 0 0 0 1 1 1 1 1 1
k= 0 0 0 0 0 0 0 0 0 0 0 0
l= 0 1 2 3 4 5 0 1 2 3 4 5
m= 0 0 0 0 0 0 0 0 0 0 0 0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and so:

>>> e[0,1,0,0,0]
6

See the NumPy internals documentation for more details about how arrays are implemented.

2. What to do?

Since numpy.reshape just creates a new view, you shouldn't be scared about using it whenever necessary. It's the right tool to use when you want to index an array in a different way.

However, in a long computation it's usually possible to arrange to construct arrays with the "right" shape in the first place, and so minimize the number of reshapes and transposes. But without seeing the actual context that led to the need for a reshape, it's hard to say what should be changed.

The example in your question is:

numpy.dot(M[:,0], numpy.ones((1, R)))

but this is not realistic. First, this expression:

M[:,0].sum()

computes the result more simply. Second, is there really something special about column 0? Perhaps what you actually need is:

M.sum(axis=0)

Why use arrays of shape (x,) rather than (x,1)?

Each item in shape's tuple denotes an axis. When you have one item in it means your array is 1 dimensional (1 axis) otherwise it will be a 2D array. When you do a.shape = [4,1] you're simply converting your 1D array to 2D:

In [26]: a = np.array([1,2,3,4])
In [27]: a.shape = [4,1]

In [28]: a.shape
Out[28]: (4, 1)

In [29]: a
Out[29]:
array([[1],
[2],
[3],
[4]])

For a np.array([1, 2, 3]) why is the shape (3,) instead of (3,1)?

In a nutshell, this is because it's one-dimensional array (hence the one-element shape tuple). Perhaps the following will help clear things up:

>>> np.array([1, 2, 3]).shape
(3,)
>>> np.array([[1, 2, 3]]).shape
(1, 3)
>>> np.array([[1], [2], [3]]).shape
(3, 1)

We can even go three dimensions (and higher):

>>> np.array([[[1]], [[2]], [[3]]]).shape
(3, 1, 1)

What is the difference between a numpy array of size (100, 1) and (100,)?

First question's answer: a is a vector, and b is a matrix. Look at this stackoverflow link for more details: Difference between numpy.array shape (R, 1) and (R,)

Second question's answer:

I think converting one to the other form should just work fine. For the function you provided, I guess it expects vectors, hence just reshape b using b = b.reshape(-1) which converts it to a single dimensions (a vector). Look at the below example for reference:

>>> import numpy as np
>>> from scipy.stats import pearsonr
>>> a = np.random.random((100,))
>>> b = np.random.random((100,1))
>>> print(a.shape, b.shape)
(100,) (100, 1)
>>> p, r= pearsonr(a, b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\xyz\Appdata\Local\Continuum\Anaconda3\lib\site-packages\scipy\stats\stats.py", line 3042, in pearsonr
r = max(min(r, 1.0), -1.0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> b = b.reshape(-1)
>>> p, r= pearsonr(a, b)
>>> print(p, r)
0.10899671932026986 0.280372238354364

What is the difference between np.array([val1, val2]) and np.array([[val1, val2]])?

np.array([1, 2]) builds an array starting from a list, thus giving you a 1D array with the shape (2, ) since it only contains a single list of two elements.

When using the double [ you are actually passing a list of lists, thus this gets you a multidimensional array, or matrix, with the shape (1, 2).

With the latter you are able to build more complex matrices like:

np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

rendering a 3x3 matrix:

array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

numpy: What is the difference between a vector of shape (5,1) and (5,)?

To use a dot product, you need matrices (represented with 2D arrays). Array with dimension (5,) is a flat array (1D array) of 5 items, where as (5, 1) is matrix with 1 column and 5 rows.

>>> import numpy as np
>>> np.zeros((5,))
array([ 0., 0., 0., 0., 0.]) # single flat array
>>> np.zeros((1,5))
array([[ 0., 0., 0., 0., 0.]]) # array with-in array
>>> np.zeros((5,1))
array([[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.]])
>>>


Related Topics



Leave a reply



Submit