Python Pandas: Get Index of Rows Which Column Matches Certain Value

Python Pandas: Get index of rows where column matches certain value

df.iloc[i] returns the ith row of df. i does not refer to the index label, i is a 0-based index.

In contrast, the attribute index returns actual index labels, not numeric row-indices:

df.index[df['BoolCol'] == True].tolist()

or equivalently,

df.index[df['BoolCol']].tolist()

You can see the difference quite clearly by playing with a DataFrame with
a non-default index that does not equal to the row's numerical position:

df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},
index=[10,20,30,40,50])

In [53]: df
Out[53]:
BoolCol
10 True
20 False
30 False
40 True
50 True

[5 rows x 1 columns]

In [54]: df.index[df['BoolCol']].tolist()
Out[54]: [10, 40, 50]

If you want to use the index,

In [56]: idx = df.index[df['BoolCol']]

In [57]: idx
Out[57]: Int64Index([10, 40, 50], dtype='int64')

then you can select the rows using loc instead of iloc:

In [58]: df.loc[idx]
Out[58]:
BoolCol
10 True
40 True
50 True

[3 rows x 1 columns]

Note that loc can also accept boolean arrays:

In [55]: df.loc[df['BoolCol']]
Out[55]:
BoolCol
10 True
40 True
50 True

[3 rows x 1 columns]

If you have a boolean array, mask, and need ordinal index values, you can compute them using np.flatnonzero:

In [110]: np.flatnonzero(df['BoolCol'])
Out[112]: array([0, 3, 4])

Use df.iloc to select rows by ordinal index:

In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]
Out[113]:
BoolCol
10 True
40 True
50 True

Get index of rows which matches certain value for whole dataset?

You can use np.where, if I undersand you correctly:

r, c = np.where(df == df.to_numpy().max())

This will return the index of every row and column in the dataframe that 99.

Now, using

indx = df.index[r]
cols = df.columns[c]

To get the labelled integers. And you, can zip to get (r,c) coordinates.

coords = list(zip(indx, cols))

Index of row and column which contain specific element

You can use df.where to mask everything other than your value and then stack it to flatten & get rid of NaNs, and lastly check the index of what remains to get the desired output:

df.where(df.eq(value)).stack().index.tolist()

An example:

>>> df = pd._testing.makeMixedDataFrame()
>>> df
A B C D
0 0.0 0.0 foo1 2009-01-01
1 1.0 1.0 foo2 2009-01-02
2 2.0 0.0 foo3 2009-01-05
3 3.0 1.0 foo4 2009-01-06
4 4.0 0.0 foo5 2009-01-07

>>> value = 1
>>> df.where(df.eq(value)).stack().index.tolist()
[(1, "A"), (1, "B"), (3, "B")]

Intermediate steps:

>>> df.where(df.eq(value))
A B C D
0 NaN NaN NaN NaT
1 1.0 1.0 NaN NaT
2 NaN NaN NaN NaT
3 NaN 1.0 NaN NaT
4 NaN NaN NaN NaT

>>> _.stack()
1 A 1.0
B 1.0
3 B 1.0
dtype: object

>>> _.index
MultiIndex([(1, "A"),
(1, "B"),
(3, "B")],
)

Get index of row where column value changes from previous row

Use Series.diff with mask for test less values like 0, last use boolean indexing with indices:

m = df1.val.diff().lt(0)
#if need test less like -7
#m = df1.val.diff().lt(-7)
one = df1.index[~m]
two = df1.index[m]
print (one)
Int64Index([0, 1, 3, 5], dtype='int64')

print (two)
nt64Index([2, 4], dtype='int64')

If need lists:

one = df1.index[~m].tolist()
two = df1.index[m].tolist()

Details:

print (df1.val.diff())

0 NaN
1 0.02
2 -8.80
3 10.55
4 -15.06
5 917.49
Name: val, dtype: float64

Python Pandas: Getting the Index of All Rows that Match a Column Value

You can use pd.Series.isin for this:

res = df[df['Channel'].isin({'A', 'B'})]

print(res)

# Rec Channel Value1 Value2
# 2 Event A 23 39.0
# 4 Post A 79 11.0
# 5 Post B 88 69.0

To return the second row by index:

res2 = res.loc[2]

print(res2)

# Rec Event
# Channel A
# Value1 23
# Value2 39
# Name: 2, dtype: object

Get the indixes of the values which are greater than 0 in the column of a dataframe

This should do the trick:

ans = df.index[df['Column_name']>0].tolist()

ans will be the list of the indexes of the values that are greater the 0 in the column "Column_name"

If you have any questions feel free to ask me in the comments and if my comment helped you please consider marking it as the answer :)

How can I get the index values in DF1 to where DF1's column values match DF2's custom multiindex values?

Not sure if this answers your query, but if we first reset the index of df1 to get that as another column 'Index', and then set_index on Name, Age , Gender to find the matches on df2 and just take the resulting Index column would that work ?

So that would be:

df1.reset_index().set_index(['Name','Age','Gender']).loc[df2.set_index(['Name','Age','Gender']).index]['Index'].values


Related Topics



Leave a reply



Submit