How to Use Np.Newaxis

np.reshape(x, (-1,1)) vs x[:, np.newaxis]

Both ways return views of the exact same data, therefore the 'data contiguity' is likely a non-issue as the data is not change, only the view is changed. See Numpy: use reshape or newaxis to add dimensions.

However there might be a practical advantage of using .reshape((-1,1)), as it will reshape the array into 2d array regardless of the original shape. For [:, np.newaxis], the result will depend on the original shape of the array, considering these:

In [3]: a1 = np.array([0, 1, 2])

In [4]: a2 = np.array([[0, 1, 2]])

In [5]: a1.reshape((-1, 1))
Out[5]: 
array([[0],
       [1],
       [2]])

In [6]: a2.reshape((-1, 1))
Out[6]: 
array([[0],
       [1],
       [2]])

In [7]: a1[:, np.newaxis]
Out[7]: 
array([[0],
       [1],
       [2]])

In [8]: a2[:, np.newaxis]
Out[8]: array([[[0, 1, 2]]])

Numpy: use reshape or newaxis to add dimensions

I don't see evidence of much difference. You could do a time test on very large arrays. Basically both fiddle with the shape, and possibly the strides. __array_interface__ is a nice way of accessing this information. For example:

In [94]: b.__array_interface__
Out[94]: 
{'data': (162400368, False),
 'descr': [('', '<f8')],
 'shape': (5,),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

In [95]: b[None,:].__array_interface__
Out[95]: 
{'data': (162400368, False),
 'descr': [('', '<f8')],
 'shape': (1, 5),
 'strides': (0, 8),
 'typestr': '<f8',
 'version': 3}

In [96]: b.reshape(1,5).__array_interface__
Out[96]: 
{'data': (162400368, False),
 'descr': [('', '<f8')],
 'shape': (1, 5),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

Both create a view, using the same data buffer as the original. Same shape, but reshape doesn't change the strides. reshape lets you specify the order.

And .flags shows differences in the C_CONTIGUOUS flag.

reshape may be faster because it is making fewer changes. But either way the operation shouldn't affect the time of larger calculations much.

e.g. for large b

In [123]: timeit np.outer(b.reshape(1,-1),b)
1 loops, best of 3: 288 ms per loop
In [124]: timeit np.outer(b[None,:],b)
1 loops, best of 3: 287 ms per loop

Interesting observation that: b.reshape(1,4).strides -> (32, 8)

Here's my guess. .__array_interface__ is displaying an underlying attribute, and .strides is more like a property (though it may all be buried in C code). The default underlying value is None, and when needed for calculation (or display with .strides) it calculates it from the shape and item size. 32 is the distance to the end of the 1st row (4x8). np.ones((2,4)).strides has the same (32,8) (and None in __array_interface__.

b[None,:] on the other hand is preparing the array for broadcasting. When broadcasted, existing values are used repeatedly. That's what the 0 in (0,8) does.

In [147]: b1=np.broadcast_arrays(b,np.zeros((2,1)))[0]

In [148]: b1.shape
Out[148]: (2, 5000)

In [149]: b1.strides
Out[149]: (0, 8)

In [150]: b1.__array_interface__
Out[150]: 
{'data': (3023336880L, False),
 'descr': [('', '<f8')],
 'shape': (2, 5),
 'strides': (0, 8),
 'typestr': '<f8',
 'version': 3}

b1 displays the same as np.ones((2,5)) but has only 5 items.

np.broadcast_arrays is a function in /numpy/lib/stride_tricks.py. It uses as_strided from the same file. These functions directly play with the shape and strides attributes.

Add multiple np.newaxis as needed?

There's builtin for that -

np.less_equal.outer(A,B)

Another way would be with reshaping to accomodate new axes -

A.reshape(list(A.shape)+[1]*B.ndim) <= B

Using np.newaxis to compute sum of squared differences

Why is a third axis created? What is the best way to visualize what is going on?

The adding new dimensions before adding/subtracting trick is a relatively common one to generate all pairs, by using broadcasting (None is the same as np.newaxis here):

>>> a = np.arange(10)
>>> a[:,None]
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

>>> a[None,:]
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

>>> a[:,None] + 100*a[None,:]
array([[  0, 100, 200, 300, 400, 500, 600, 700, 800, 900],
       [  1, 101, 201, 301, 401, 501, 601, 701, 801, 901],
       [  2, 102, 202, 302, 402, 502, 602, 702, 802, 902],
       [  3, 103, 203, 303, 403, 503, 603, 703, 803, 903],
       [  4, 104, 204, 304, 404, 504, 604, 704, 804, 904],
       [  5, 105, 205, 305, 405, 505, 605, 705, 805, 905],
       [  6, 106, 206, 306, 406, 506, 606, 706, 806, 906],
       [  7, 107, 207, 307, 407, 507, 607, 707, 807, 907],
       [  8, 108, 208, 308, 408, 508, 608, 708, 808, 908],
       [  9, 109, 209, 309, 409, 509, 609, 709, 809, 909]])

Your example does the same, just with 2-vectors instead of scalars at the innermost level:

>>> X[:,np.newaxis,:].shape
(10, 1, 2)

>>> X[np.newaxis,:,:].shape
(1, 10, 2)

>>> (X[:,np.newaxis,:] - X[np.newaxis,:,:]).shape
(10, 10, 2)

Thus we find that the 'magical subtraction' is just all combinations of the coordinate X subtracted from each other.

Is there a more intuitive way to perform this calculation?

Yes, use scipy.spatial.distance.pdist for pairwise distances. To get an equivalent result to your example:

from scipy.spatial.distance import pdist, squareform
dist_sq = squareform(pdist(X))**2

Numpy np.newaxis

df_train['SalePrice'] is a Pandas.Series (vector / 1D array) of a shape: (N elements,)

Modern (version: 0.17+) SKLearn methods don't like 1D arrays (vectors), they expect 2D arrays.

df_train['SalePrice'][:,np.newaxis]

transforms 1D array (shape: N elements) into 2D array (shape: N rows, 1 column).

Demo:

In [21]: df = pd.DataFrame(np.random.randint(10, size=(5, 3)), columns=list('abc'))

In [22]: df
Out[22]:
   a  b  c
0  4  3  8
1  7  5  6
2  1  3  9
3  7  5  7
4  7  0  6

In [23]: from sklearn.preprocessing import StandardScaler

In [24]: df['a'].shape
Out[24]: (5,)      # <--- 1D array

In [25]: df['a'][:, np.newaxis].shape
Out[25]: (5, 1)    # <--- 2D array

There is Pandas way to do the same:

In [26]: df[['a']].shape
Out[26]: (5, 1)    # <--- 2D array

In [27]: StandardScaler().fit_transform(df[['a']])
Out[27]:
array([[-0.5 ],
       [ 0.75],
       [-1.75],
       [ 0.75],
       [ 0.75]])

What happens if we will pass 1D array:

In [28]: StandardScaler().fit_transform(df['a'])
C:\Users\Max\Anaconda4\lib\site-packages\sklearn\utils\validation.py:429: DataConversionWarning: Data with input dtype int32 was converted t
o float64 by StandardScaler.
  warnings.warn(msg, _DataConversionWarning)
C:\Users\Max\Anaconda4\lib\site-packages\sklearn\preprocessing\data.py:586: DeprecationWarning: Passing 1d arrays as data is deprecated in 0
.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
 if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
C:\Users\Max\Anaconda4\lib\site-packages\sklearn\preprocessing\data.py:649: DeprecationWarning: Passing 1d arrays as data is deprecated in 0
.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
 if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
Out[28]: array([-0.5 ,  0.75, -1.75,  0.75,  0.75])

Why is arr[:][np.newaxis].shape = (1, n) instead of (n, 1)?

In arr[:][np.newaxis] and arr[np.newaxis][:] the indexing is done sequentially, so arr2 = arr[:][np.newaxis] is equivalent to:

arr_temp = arr[:]
arr2 = arr_temp[np.newaxis]
del arr_temp

The same logic applies to ordering the indexing operators the other way round, for arr2 = arr[np.newaxis][:]:

arr_temp = arr[np.newaxis]
arr2 = arr_temp[:]
del arr_temp

Now, to quote https://numpy.org/doc/1.19/reference/arrays.indexing.html:

Each newaxis object in the selection tuple serves to expand the dimensions of the resulting selection by one unit-length dimension. The added dimension is the position of the newaxis object in the selection tuple.

Since np.newaxis is at the first position (there is only one position) in the indexing selection tuple in both arr[np.newaxis] and arr_temp[np.newaxis], it will create the new dimension as the first dimension, and thus the resulting shape is (1, 4) in both cases.