Pandas - Find Index of Value Anywhere in Dataframe

Pandas - find index of value anywhere in DataFrame

Supposing that your DataFrame looks like the following :

      0       1            2      3    4
0 a er tfr sdf 34
1 rt tyh fgd thy rer
2 1 2 3 4 5
3 6 7 8 9 10
4 dsf wew security_id name age
5 dfs bgbf 121 jason 34
6 dddp gpot 5754 mike 37
7 fpoo werwrw 342 jack 31

Do the following :

for row in range(df.shape[0]): # df is the DataFrame
for col in range(df.shape[1]):
if df.get_value(row,col) == 'security_id':
print(row, col)
break

Search for a value anywhere in a pandas DataFrame

You can perform equality comparison on the entire DataFrame:

df[df.eq(var1).any(1)]

Python Pandas: Get index of rows which column matches certain value

df.iloc[i] returns the ith row of df. i does not refer to the index label, i is a 0-based index.

In contrast, the attribute index returns actual index labels, not numeric row-indices:

df.index[df['BoolCol'] == True].tolist()

or equivalently,

df.index[df['BoolCol']].tolist()

You can see the difference quite clearly by playing with a DataFrame with
a non-default index that does not equal to the row's numerical position:

df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},
index=[10,20,30,40,50])

In [53]: df
Out[53]:
BoolCol
10 True
20 False
30 False
40 True
50 True

[5 rows x 1 columns]

In [54]: df.index[df['BoolCol']].tolist()
Out[54]: [10, 40, 50]

If you want to use the index,

In [56]: idx = df.index[df['BoolCol']]

In [57]: idx
Out[57]: Int64Index([10, 40, 50], dtype='int64')

then you can select the rows using loc instead of iloc:

In [58]: df.loc[idx]
Out[58]:
BoolCol
10 True
40 True
50 True

[3 rows x 1 columns]

Note that loc can also accept boolean arrays:

In [55]: df.loc[df['BoolCol']]
Out[55]:
BoolCol
10 True
40 True
50 True

[3 rows x 1 columns]

If you have a boolean array, mask, and need ordinal index values, you can compute them using np.flatnonzero:

In [110]: np.flatnonzero(df['BoolCol'])
Out[112]: array([0, 3, 4])

Use df.iloc to select rows by ordinal index:

In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]
Out[113]:
BoolCol
10 True
40 True
50 True

Finding the index for a value in a Pandas Dataframe

You're essentially looking for two conditions. For the first condition, you want the given value to be greater than 0.1:

df['value'].gt(0.1)

For the second condition, you want the previous non-null value to be less than 0.1:

df['value'].ffill().shift().lt(0.1)

Now, combine the two conditions with the and operator, reverse the resulting Boolean indexer, and use idxmax to find the the first (last) instance where your condition holds:

(df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1].idxmax()

Which gives the expected index value.

The above method assumes that at least one value satisfies the situation you've described. If it's possible that your data may not satisfy your situation you may want to use any to verify that a solution exists:

# Build the condition.
cond = (df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1]

# Check if the condition is met anywhere.
if cond.any():
idx = cond.idxmax()
else:
idx = ???

In you're question, you've specified both inequalities to be strict. What happens for a value exactly equal to 0.1? You may want to change one of the gt/lt to ge/le to account for this.

speed up pandas search for a certain value not in the whole df

Just to make a full answer out of my comment:

With -1 not in test1.values you can check if -1 is in your DataFrame.

Regarding the performance, this still needs to check every single value, which is in your case

10^5*10^2 = 10^7.

You only save with this the performance cost for summation and an additional comparison of these results.

How to find Value at specific index in an array in a dataframe?

This line isn't doing what you think it's doing:

    w=AvgT.SRi[maxsa]

You are accessing the value of SRi in row maxsa of the dataframe -- that is, you are getting the whole list. I assume you are getting an IndexError because in at least one instance, the argmax of SAi is higher than the number of rows in your dataframe.

Try replacing that line with this:

    w=AvgT.SRi[index][maxsa]

Get the indexes for the top 3 values from a dataframe row (using a fast implementation)

you can use np.sort with axis=1, use [:,::-1] to reverse the order of the sort and then [:,:3] to select the first 3 columns of the array. Then recreate the dataframe

#input
import numpy as np

np.random.seed(3)
df = pd.DataFrame(np.random.randint(0,100,100).reshape(10, 10),
columns=list('abcdefghij'))

# sort
top3 = pd.DataFrame(np.sort(df, axis=1)[:, ::-1][:,:3])
print(top3)
0 1 2
0 74 72 56
1 96 93 81
2 90 90 69
3 97 79 62
4 94 78 64
5 85 71 63
6 99 91 80
7 96 95 61
8 91 90 74
9 88 60 56

EDIT: OP changed the question to extract the columns' names of the top 3 values per row, that can be done with argsort and slicing the columns names:

print(pd.DataFrame(df.columns.to_numpy()
[np.argsort(df.to_numpy(), axis=1)][:, -1:-4:-1]))


Related Topics



Leave a reply



Submit