Pandas - Find Index of Value Anywhere in Dataframe

Pandas - find index of value anywhere in DataFrame

Supposing that your DataFrame looks like the following :

      0       1            2      3    4
0     a      er          tfr    sdf   34
1    rt     tyh          fgd    thy  rer
2     1       2            3      4    5
3     6       7            8      9   10
4   dsf     wew  security_id   name  age
5   dfs    bgbf          121  jason   34
6  dddp    gpot         5754   mike   37
7  fpoo  werwrw          342   jack   31

Do the following :

for row in range(df.shape[0]): # df is the DataFrame
         for col in range(df.shape[1]):
             if df.get_value(row,col) == 'security_id':
                 print(row, col)
                 break

Search for a value anywhere in a pandas DataFrame

You can perform equality comparison on the entire DataFrame:

df[df.eq(var1).any(1)]

Python Pandas: Get index of rows which column matches certain value

df.iloc[i] returns the ith row of df. i does not refer to the index label, i is a 0-based index.

In contrast, the attribute index returns actual index labels, not numeric row-indices:

df.index[df['BoolCol'] == True].tolist()

or equivalently,

df.index[df['BoolCol']].tolist()

You can see the difference quite clearly by playing with a DataFrame with
a non-default index that does not equal to the row's numerical position:

df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},
       index=[10,20,30,40,50])

In [53]: df
Out[53]: 
   BoolCol
10    True
20   False
30   False
40    True
50    True

[5 rows x 1 columns]

In [54]: df.index[df['BoolCol']].tolist()
Out[54]: [10, 40, 50]

If you want to use the index,

In [56]: idx = df.index[df['BoolCol']]

In [57]: idx
Out[57]: Int64Index([10, 40, 50], dtype='int64')

then you can select the rows using loc instead of iloc:

In [58]: df.loc[idx]
Out[58]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

Note that loc can also accept boolean arrays:

In [55]: df.loc[df['BoolCol']]
Out[55]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

If you have a boolean array, mask, and need ordinal index values, you can compute them using np.flatnonzero:

In [110]: np.flatnonzero(df['BoolCol'])
Out[112]: array([0, 3, 4])

Use df.iloc to select rows by ordinal index:

In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]
Out[113]: 
   BoolCol
10    True
40    True
50    True

Finding the index for a value in a Pandas Dataframe

You're essentially looking for two conditions. For the first condition, you want the given value to be greater than 0.1:

df['value'].gt(0.1)

For the second condition, you want the previous non-null value to be less than 0.1:

df['value'].ffill().shift().lt(0.1)

Now, combine the two conditions with the and operator, reverse the resulting Boolean indexer, and use idxmax to find the the first (last) instance where your condition holds:

(df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1].idxmax()

Which gives the expected index value.

The above method assumes that at least one value satisfies the situation you've described. If it's possible that your data may not satisfy your situation you may want to use any to verify that a solution exists:

# Build the condition.
cond = (df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1]

# Check if the condition is met anywhere.
if cond.any():
    idx = cond.idxmax()
else:
    idx = ???

In you're question, you've specified both inequalities to be strict. What happens for a value exactly equal to 0.1? You may want to change one of the gt/lt to ge/le to account for this.

speed up pandas search for a certain value not in the whole df

Just to make a full answer out of my comment:

With -1 not in test1.values you can check if -1 is in your DataFrame.

Regarding the performance, this still needs to check every single value, which is in your case

10^5*10^2 = 10^7.

You only save with this the performance cost for summation and an additional comparison of these results.

How to find Value at specific index in an array in a dataframe?

This line isn't doing what you think it's doing:

    w=AvgT.SRi[maxsa]

You are accessing the value of SRi in row maxsa of the dataframe -- that is, you are getting the whole list. I assume you are getting an IndexError because in at least one instance, the argmax of SAi is higher than the number of rows in your dataframe.

Try replacing that line with this:

    w=AvgT.SRi[index][maxsa]

Get the indexes for the top 3 values from a dataframe row (using a fast implementation)

you can use np.sort with axis=1, use [:,::-1] to reverse the order of the sort and then [:,:3] to select the first 3 columns of the array. Then recreate the dataframe

#input
import numpy as np

np.random.seed(3)
df = pd.DataFrame(np.random.randint(0,100,100).reshape(10, 10), 
                  columns=list('abcdefghij'))

# sort
top3 = pd.DataFrame(np.sort(df, axis=1)[:, ::-1][:,:3])
print(top3)
    0   1   2
0  74  72  56
1  96  93  81
2  90  90  69
3  97  79  62
4  94  78  64
5  85  71  63
6  99  91  80
7  96  95  61
8  91  90  74
9  88  60  56

EDIT: OP changed the question to extract the columns' names of the top 3 values per row, that can be done with argsort and slicing the columns names:

print(pd.DataFrame(df.columns.to_numpy()
                     [np.argsort(df.to_numpy(), axis=1)][:, -1:-4:-1]))

Pandas - Find Index of Value Anywhere in Dataframe