How to Extract the Entire Row and Columns When Condition Met in Numpy Array

How to extract the entire row and columns when condition met in numpy array

You can slice the array when using np.where() so that only this first column is used:

form = np.array([['A', 2.2, 9.1],
['A', 7.1, 2.3],
['B', 4.1, 1.1]],dtype=object)

j = form[np.where(form[:,0]=='A')]
print (j)
# [['A' 2.2 9.1]
# ['A' 7.1 2.3]]

Extracting specific columns in numpy array by condition

In Python, the expression -0.4 < x_y_z[2] < 0.1 is roughly equivalent to -0.4 < x_y_z[2] and x_y_z[2] < 0.1. The and operator decides the truth value of each part of the expression by converting it into a bool. Unlike Python lists and tuples, numpy arrays do not support the conversion.

The correct way to specify the condition is with bitwise & (which is unambiguous and non-short-circuiting), rather than the implicit and (which short circuits and is ambiguous in this case):

condition = ((x_y_z[2, :] > - 0.4) & (x_y_z[2, :] < 0.1))

condition is a boolean mask that selects the columns you want. You can select the rows with a simple slice:

selection = x_y_z[:, condition] 

How to extract rows from a numpy array, that meet several conditions?

For convenience, define Timestamp as a np.datetie64 creator:

In [492]: Timestamp=lambda x: np.datetime64(x, 's')
In [493]: Timestamp('2018-01-15 01:59:00')
Out[493]: numpy.datetime64('2018-01-15T01:59:00')
In [494]: original = np.array([[Timestamp('2018-01-15 01:59:00'), 329, 30, 5],
...: [Timestamp('2018-01-15 01:59:00'), 326, 25, 3],
...: [Timestamp('2018-01-15 02:00:00'), 324, 22, 34],
...: [Timestamp('2018-01-15 21:57:00'), 322, 23, 3],
...: [Timestamp('2018-01-15 21:57:00'), 323, 30, 9],
...: [Timestamp('2018-01-15 21:59:00'), 323, 1, 19]], dty
...: pe=object)
...:
In [495]: original
Out[495]:
array([[numpy.datetime64('2018-01-15T01:59:00'), 329, 30, 5],
[numpy.datetime64('2018-01-15T01:59:00'), 326, 25, 3],
[numpy.datetime64('2018-01-15T02:00:00'), 324, 22, 34],
[numpy.datetime64('2018-01-15T21:57:00'), 322, 23, 3],
[numpy.datetime64('2018-01-15T21:57:00'), 323, 30, 9],
[numpy.datetime64('2018-01-15T21:59:00'), 323, 1, 19]],
dtype=object)

Now we can to the time test with:

In [500]: original[:,0]<Timestamp('2018-01-15 06:00:00')
Out[500]: array([ True, True, True, False, False, False])
In [501]: original[:,0]>Timestamp('2018-01-15 01:00:00')
Out[501]: array([ True, True, True, True, True, True])
In [502]: mask = Out[500] & Out[501]
In [503]: mask
Out[503]: array([ True, True, True, False, False, False])

Test on columns 2&3

In [509]: (original[:,[2,3]]>=30).any(axis=1)
Out[509]: array([ True, False, True, False, True, False])

and

In [506]: (original[:,2]>(original[:,3]*2)) | (original[:,3]>=(original[:,2]*2))
...:
Out[506]: array([ True, True, False, True, True, True])

and together

In [510]: mask & Out[509] & Out[506]
Out[510]: array([ True, False, False, False, False, False])
In [511]: np.where(Out[510])
Out[511]: (array([0]),)

Sometimes object dtype hinders calculations, usually it a function can't delegate the task to methods of the objects. Here the Python integers can be compared, so object arrays can also be compared. In a large array these comparisons might be faster if part of the array was first converted to a 2d numeric array.

In [512]: original[:,1:].astype(int)
Out[512]:
array([[329, 30, 5],
[326, 25, 3],
[324, 22, 34],
[322, 23, 3],
[323, 30, 9],
[323, 1, 19]])

Pandas seems to be 'happier' dealing with object dtypes, but I think that flexibility comes at a speed cost.

Selecting specific rows from an array when a condition is met in python

Instead of a for loop, you can use numpy masks, that are more efficient.

With your problem:

import numpy as np
iou = np.random.rand(300,4)
indices = np.where((iou < 0.5).all(axis=1))
negative_boxes = iou[indices]

Then indices contains all the indices of the rows where all values are smaller than 0.5 and negative_boxes contains the array with only the small values you are looking for.

Python - Select row in NumPy array where multiple conditions are met

You could use a & b to perform element-wise AND for two boolean arrays a, b:

selected_rows = x[(x[:,1] == 'ADC01') & (x[:,2] == 'Input2')]

Similarly, use a | b for OR and ~a for NOT.

Selecting rows depending on conditions for multiple columns

You can use following approach :

>>> test_array[np.where(np.all(test_array[:,[0,3]]==[1,5],axis=1))]
array([[1, 2, 3, 5]])

Numpy array: How to extract whole rows based on values in a column

Just make mask and use it.

mask = np.logical_and(my_array[:, 1] >= 55, my_array[:, 1] <= 65)
desired_array = my_array[mask]
desired_array


Related Topics



Leave a reply



Submit