Why Does Indexing Numpy Arrays with Brackets and Commas Differ in Behavior

Why does indexing numpy arrays with brackets and commas differ in behavior?

This:

x[:, 1]

means "take all indices of x along the first axis, but only index 1 along the second".

This:

x[:][1]

means "take all indices of x along the first axis (so all of x), then take index 1 along the first axis of the result". You're applying the 1 to the wrong axis.

x[1][2] and x[1, 2] are only equivalent because indexing an array with an integer shifts all remaining axes towards the front of the shape, so the first axis of x[1] is the second axis of x. This doesn't generalize at all; you should almost always use commas instead of multiple indexing steps.

Difference between np.array[a:b, c:d] and np.array[a:b][c:d]

First of all, 2D ndarray can be sliced as a[row slice, col slice].
So, x[:2, 1:4] is to slice ndarray x based on both row slice ([:2) and column slice ([1:4]).

However, x[:2][1:4] is slice [:2] first, and then slice [1:4].
Thus, x[:2][1:4] is the same as x[1:2].

Why does NumPy advanced indexing yield different results for list of lists and numpy array?

In the end it comes down to what is mentioned in https://stackoverflow.com/a/40599589/7919597

From numpy's indexing documentation:

In order to remain backward compatible with a common usage in Numeric,
basic slicing is also initiated if the selection object is any
non-ndarray sequence (such as a list) containing slice objects, the
Ellipsis object, or the newaxis object, but not for integer arrays or
other embedded sequences.

The example with the list triggers an undocumented part of the backward compatibility logic, as described in a comment in the source code:

/*
 * Sequences < NPY_MAXDIMS with any slice objects
 * or newaxis, Ellipsis or other arrays or sequences
 * embedded, are considered equivalent to an indexing
 * tuple. (`a[[[1,2], [3,4]]] == a[[1,2], [3,4]]`)
 */

How is it possible for Numpy to use comma-separated subscripting with `:`?

Define a simple class with a getitem, indexing method:

In [128]: class Foo():
     ...:     def __getitem__(self, arg):
     ...:         print(type(arg), arg)
     ...: 
In [129]: f = Foo()

And look at what different indexes produce:

In [130]: f[:]
<class 'slice'> slice(None, None, None)
In [131]: f[1:2:3]
<class 'slice'> slice(1, 2, 3)
In [132]: f[:, [1,2,3]]
<class 'tuple'> (slice(None, None, None), [1, 2, 3])
In [133]: f[:, :3]
<class 'tuple'> (slice(None, None, None), slice(None, 3, None))
In [134]: f[(slice(1,None),3)]
<class 'tuple'> (slice(1, None, None), 3)

For builtin classes like list, a tuple argument raises an error. But that's a class dependent issue, not a syntax one. numpy.ndarray accepts a tuple, as long as it's compatible with its shape.

The syntax for a tuple index was added to Python to meet the needs of numpy. I don't think there are any builtin classes that use it.

The numpy.lib.index_tricks.py module has several classes that take advantage of this behavior. Look at its code for more ideas.

In [137]: np.s_[3:]
Out[137]: slice(3, None, None)
In [139]: np.r_['0,2,1',[1,2,3],[4,5,6]]
Out[139]: 
array([[1, 2, 3],
       [4, 5, 6]])
In [140]: np.c_[[1,2,3],[4,5,6]]
Out[140]: 
array([[1, 4],
       [2, 5],
       [3, 6]])

other "indexing" examples:

In [141]: f[...]
<class 'ellipsis'> Ellipsis
In [142]: f[[1,2,3]]
<class 'list'> [1, 2, 3]
In [143]: f[10]
<class 'int'> 10
In [144]: f[{1:12}]
<class 'dict'> {1: 12}

I don't know of any class that makes use of a dict argument, but the syntax allows it.

Cope with different slicing-behaviour in scipy.sparse and numpy

In [150]: arr = np.asarray([[0,0,0,1], [1,1,0,0], [1,0,1,0], [1,0,0,1], [1,0,0,1], [1,0,0,1]]) 
     ...: M = sparse.lil_matrix(arr) # Or another format like .csr_matrix etc.

A scalar index on a ndarray reduces the dimensions by one:

In [151]: arr[:,3]                                                                                           
Out[151]: array([1, 0, 0, 1, 1, 1])

It does not change the number of dimensions of the sparse matrix.

In [152]: M[:,3]                                                                                             
Out[152]: 
<6x1 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in LInked List format>

This behavior is similar to that of np.matrix subclass (and MATLAB). A sparse matrix is always 2d.

The dense array display of this matrix:

In [153]: M[:,3].A                                                                                           
Out[153]: 
array([[1],
       [0],
       [0],
       [1],
       [1],
       [1]], dtype=int64)

and the np.matrix display:

In [154]: M[:,3].todense()                                                                                   
Out[154]: 
matrix([[1],
        [0],
        [0],
        [1],
        [1],
        [1]], dtype=int64)

np.matrix has a A1 property which produces a 1d array (it converts to ndarray and applies ravel):

In [155]: M[:,3].todense().A1                                                                                
Out[155]: array([1, 0, 0, 1, 1, 1], dtype=int64)

ravel, squeeze and scalar indexing are all ways of reducing the dimensions of a ndarray. But they don't work directly on a np.matrix or sparse matrix.

Another example of a 2d sparse matrix:

In [156]: sparse.lil_matrix(arr[:,3])                                                                        
Out[156]: 
<1x6 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in LInked List format>
In [157]: _.A                                                                                                
Out[157]: array([[1, 0, 0, 1, 1, 1]], dtype=int64)

Note the [[...]]. sparse has added a leading size 1 dimension to the 1d ndarray.

Python indenting numpy matrix yields different results to printing matrix fully

It depends what you mean by 33 element value. Here you deal with a 2D matrix.

Please note that for simplicity. I'll say xth row for the (x+1) row

In the second case you are taking the element in the 33rd row and 32th column, it's equivalent to array[33,32]

In the first case you are returning the 32th row.

What is important to note is that here you are doing chain indexing.

array[:,32] (which will return all the value of the 32th column) is not equivalent to array[:][32] that will first return your array then you'll take the 32th row.

What is x[ : , 0]?

As explained here this is a numpy-specific notation to index arrays, not plain python. This is why it does not work on your code. In your (initial) case, the sklearn object probably wraps a numpy array that supports the numpy slicing syntax.

In your specific case, it would work as follows:

import numpy as np

y = np.array([[1, 2], [2, 3], [3, 4]])
print(y[:, 0])

# prints: [1 2 3]

This would yield all indexes along the first axis (i.e. use full column vectors), but use only index 0 in the second axis (i.e. use only the first column vector).

Why Does Indexing Numpy Arrays with Brackets and Commas Differ in Behavior