Python Pandas: Get index of rows where column matches certain value
df.iloc[i]
returns the ith
row of df
. i
does not refer to the index label, i
is a 0-based index.
In contrast, the attribute index
returns actual index labels, not numeric row-indices:
df.index[df['BoolCol'] == True].tolist()
or equivalently,
df.index[df['BoolCol']].tolist()
You can see the difference quite clearly by playing with a DataFrame with
a non-default index that does not equal to the row's numerical position:
df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},
index=[10,20,30,40,50])
In [53]: df
Out[53]:
BoolCol
10 True
20 False
30 False
40 True
50 True
[5 rows x 1 columns]
In [54]: df.index[df['BoolCol']].tolist()
Out[54]: [10, 40, 50]
If you want to use the index,
In [56]: idx = df.index[df['BoolCol']]
In [57]: idx
Out[57]: Int64Index([10, 40, 50], dtype='int64')
then you can select the rows using loc
instead of iloc
:
In [58]: df.loc[idx]
Out[58]:
BoolCol
10 True
40 True
50 True
[3 rows x 1 columns]
Note that loc
can also accept boolean arrays:
In [55]: df.loc[df['BoolCol']]
Out[55]:
BoolCol
10 True
40 True
50 True
[3 rows x 1 columns]
If you have a boolean array, mask
, and need ordinal index values, you can compute them using np.flatnonzero
:
In [110]: np.flatnonzero(df['BoolCol'])
Out[112]: array([0, 3, 4])
Use df.iloc
to select rows by ordinal index:
In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]
Out[113]:
BoolCol
10 True
40 True
50 True
Get index of rows which matches certain value for whole dataset?
You can use np.where
, if I undersand you correctly:
r, c = np.where(df == df.to_numpy().max())
This will return the index of every row and column in the dataframe that 99.
Now, using
indx = df.index[r]
cols = df.columns[c]
To get the labelled integers. And you, can zip
to get (r,c) coordinates.
coords = list(zip(indx, cols))
Index of row and column which contain specific element
You can use df.where
to mask everything other than your value and then stack
it to flatten & get rid of NaNs, and lastly check the index of what remains to get the desired output:
df.where(df.eq(value)).stack().index.tolist()
An example:
>>> df = pd._testing.makeMixedDataFrame()
>>> df
A B C D
0 0.0 0.0 foo1 2009-01-01
1 1.0 1.0 foo2 2009-01-02
2 2.0 0.0 foo3 2009-01-05
3 3.0 1.0 foo4 2009-01-06
4 4.0 0.0 foo5 2009-01-07
>>> value = 1
>>> df.where(df.eq(value)).stack().index.tolist()
[(1, "A"), (1, "B"), (3, "B")]
Intermediate steps:
>>> df.where(df.eq(value))
A B C D
0 NaN NaN NaN NaT
1 1.0 1.0 NaN NaT
2 NaN NaN NaN NaT
3 NaN 1.0 NaN NaT
4 NaN NaN NaN NaT
>>> _.stack()
1 A 1.0
B 1.0
3 B 1.0
dtype: object
>>> _.index
MultiIndex([(1, "A"),
(1, "B"),
(3, "B")],
)
Get index of row where column value changes from previous row
Use Series.diff
with mask for test less values like 0
, last use boolean indexing
with indices:
m = df1.val.diff().lt(0)
#if need test less like -7
#m = df1.val.diff().lt(-7)
one = df1.index[~m]
two = df1.index[m]
print (one)
Int64Index([0, 1, 3, 5], dtype='int64')
print (two)
nt64Index([2, 4], dtype='int64')
If need lists:
one = df1.index[~m].tolist()
two = df1.index[m].tolist()
Details:
print (df1.val.diff())
0 NaN
1 0.02
2 -8.80
3 10.55
4 -15.06
5 917.49
Name: val, dtype: float64
Python Pandas: Getting the Index of All Rows that Match a Column Value
You can use pd.Series.isin
for this:
res = df[df['Channel'].isin({'A', 'B'})]
print(res)
# Rec Channel Value1 Value2
# 2 Event A 23 39.0
# 4 Post A 79 11.0
# 5 Post B 88 69.0
To return the second row by index:
res2 = res.loc[2]
print(res2)
# Rec Event
# Channel A
# Value1 23
# Value2 39
# Name: 2, dtype: object
Get the indixes of the values which are greater than 0 in the column of a dataframe
This should do the trick:
ans = df.index[df['Column_name']>0].tolist()
ans
will be the list of the indexes of the values that are greater the 0 in the column "Column_name"
If you have any questions feel free to ask me in the comments and if my comment helped you please consider marking it as the answer :)
How can I get the index values in DF1 to where DF1's column values match DF2's custom multiindex values?
Not sure if this answers your query, but if we first reset the index of df1 to get that as another column 'Index', and then set_index on Name, Age , Gender to find the matches on df2 and just take the resulting Index column would that work ?
So that would be:
df1.reset_index().set_index(['Name','Age','Gender']).loc[df2.set_index(['Name','Age','Gender']).index]['Index'].values
Related Topics
How to Set the R_Home Environment Variable to the R Home Directory
R Foverlaps Equivalent in Python
What Programming Language Features Are Well Suited for Developing a Live Coding Framework
Swift Playground Error: Module 'Python' Has No Member Named 'Import'
Equivalent of a Python Dict in R
Closest Equivalent of a Factor Variable in Python Pandas
Comparison of R, Statmodels, Sklearn for a Classification Task with Logistic Regression
How to Print Variable and String on Same Line in Python
Call a Function with Argument List in Python
What Are the Python Equivalents to Ruby's Bundler/Perl's Carton
Ruby Equivalent of Python's "Dir"
Does Python Have an "Or Equals" Function Like ||= in Ruby
Is There Something Like Bpython for Ruby
Learning Ruby from Python; Differences and Similarities
How to Implement a Tree in Python
How to Search Sub-Folders Using Glob.Glob Module
Django: Multiple Models in One Template Using Forms
Display Loading Symbol While Waiting for a Result with Plot.Ly Dash