Numpy: Find First Index of Value Fast

Numpy: find first index of value fast

There is a feature request for this scheduled for Numpy 2.0.0: https://github.com/numpy/numpy/issues/2269

Is there a NumPy function to return the first index of something in an array?

Yes, given an array, array, and a value, item to search for, you can use np.where as:

itemindex = numpy.where(array == item)

The result is a tuple with first all the row indices, then all the column indices.

For example, if an array is two dimensions and it contained your item at two locations then

array[itemindex[0][0]][itemindex[1][0]]

would be equal to your item and so would be:

array[itemindex[0][1]][itemindex[1][1]]

Numpy first occurrence of value greater than existing value

This is a little faster (and looks nicer)

np.argmax(aa>5)

Since argmax will stop at the first True ("In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.") and doesn't save another list.

In [2]: N = 10000

In [3]: aa = np.arange(-N,N)

In [4]: timeit np.argmax(aa>N/2)
100000 loops, best of 3: 52.3 us per loop

In [5]: timeit np.where(aa>N/2)[0][0]
10000 loops, best of 3: 141 us per loop

In [6]: timeit np.nonzero(aa>N/2)[0][0]
10000 loops, best of 3: 142 us per loop

Find the index of the first occurrence of some value that is not X or Y in a numpy array

np.unique returns the first index of each number if you specify return_index=True. You can filter the result pretty easily using, e.g., np.isin:

u, i =  np.unique(vec, return_index=True)
result = i[np.isin(u, [51, 52], invert=True)]

The advantage of doing it this way is that u is a significantly reduced search space compared to the original data. Using invert=True also speeds things up a little compared to explicitly negating the resulting mask.

A version of np.isin that relies on the fact that the data is already sorted could be made using np.searchsorted like this:

def isin_sorted(a, i, invert=False):
    ind = np.searchsorted(a, i)
    ind = ind[a[ind.clip(max=a.size)] == i]
    if invert:
        mask = np.ones(a.size, dtype=bool)
        mask[ind] = False
    else:
        mask = np.zeros(a.size, dtype=bool)
        mask[ind] = True
    return mask

You could use this version in place of np.isin, after calling np.unique, which always returns a sorted array. For sufficiently large vec and exclusion lists, it will be more efficient:

result = i[isin_sorted(u, [51, 52], invert=True)]

numpy - return first index of element in array

You can use np.argwhere to get the matching indices packed as a 2D array with each row holding indices for each match and then index into the first row, like so -

np.argwhere(zArray==match)[0]

Alternatively, faster one with argmax to get the index of the first match on a flattened version and np.unravel_index for per-dim indices tuple -

np.unravel_index((zArray==match).argmax(), zArray.shape)

Sample run -

In [100]: zArray
Out[100]: 
array([[   0, 1200, 5000], # different from sample for a generic one
       [1320,   24, 5000],
       [5000,  234, 5230]])

In [101]: match
Out[101]: 5000

In [102]: np.argwhere(zArray==match)[0]
Out[102]: array([0, 2])

In [103]: np.unravel_index((zArray==match).argmax(), zArray.shape)
Out[103]: (0, 2)

Runtime test -

In [104]: a = np.random.randint(0,100,(1000,1000))

In [105]: %timeit np.argwhere(a==50)[0]
100 loops, best of 3: 2.41 ms per loop

In [106]: %timeit np.unravel_index((a==50).argmax(), a.shape)
1000 loops, best of 3: 493 µs per loop

Python/NumPy: find the first index of zero, then replace all elements with zero after that for each row

One way to accomplish question 1 is to use numpy.cumprod

>>> np.cumprod(a, axis=1)
array([[1, 0, 0, 0, 0],
       [1, 1, 1, 1, 0],
       [1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0]])

Efficiently return the index of the first value satisfying condition in array

`numba`

With numba it's possible to optimise both scenarios. Syntactically, you need only construct a function with a simple for loop:

from numba import njit

@njit
def get_first_index_nb(A, k):
    for i in range(len(A)):
        if A[i] > k:
            return i
    return -1

idx = get_first_index_nb(A, 0.9)

Numba improves performance by JIT ("Just In Time") compiling code and leveraging CPU-level optimisations. A regular for loop without the @njit decorator would typically be slower than the methods you've already tried for the case where the condition is met late.

For a Pandas numeric series df['data'], you can simply feed the NumPy representation to the JIT-compiled function:

idx = get_first_index_nb(df['data'].values, 0.9)

Generalisation

Since numba permits functions as arguments, and assuming the passed the function can also be JIT-compiled, you can arrive at a method to calculate the nth index where a condition is met for an arbitrary func.

@njit
def get_nth_index_count(A, func, count):
    c = 0
    for i in range(len(A)):
        if func(A[i]):
            c += 1
            if c == count:
                return i
    return -1

@njit
def func(val):
    return val > 0.9

# get index of 3rd value where func evaluates to True
idx = get_nth_index_count(arr, func, 3)

For the 3rd last value, you can feed the reverse, arr[::-1], and negate the result from len(arr) - 1, the - 1 necessary to account for 0-indexing.

Performance benchmarking

# Python 3.6.5, NumPy 1.14.3, Numba 0.38.0

np.random.seed(0)
arr = np.random.rand(10**7)
m = 0.9
n = 0.999999

@njit
def get_first_index_nb(A, k):
    for i in range(len(A)):
        if A[i] > k:
            return i
    return -1

def get_first_index_np(A, k):
    for i in range(len(A)):
        if A[i] > k:
            return i
    return -1

%timeit get_first_index_nb(arr, m)                                 # 375 ns
%timeit get_first_index_np(arr, m)                                 # 2.71 µs
%timeit next(iter(np.where(arr > m)[0]), -1)                       # 43.5 ms
%timeit next((idx for idx, val in enumerate(arr) if val > m), -1)  # 2.5 µs

%timeit get_first_index_nb(arr, n)                                 # 204 µs
%timeit get_first_index_np(arr, n)                                 # 44.8 ms
%timeit next(iter(np.where(arr > n)[0]), -1)                       # 21.4 ms
%timeit next((idx for idx, val in enumerate(arr) if val > n), -1)  # 39.2 ms

numpy find slice along an axes where the first and last occurring value occurs

You probably can optimize this to be faster, but here is a vectorized version of what you search:

axis = 1
mask = np.where(x==val)[axis]
first, last = np.amin(mask), np.amax(mask)

It first finds the element val in your array using np.where and returns the min and max of indices along desired axis.

Numpy: Find First Index of Value Fast