Pandas GroupBy and select rows with the minimum value in a specific column
I feel like you're overthinking this. Just use groupby
and idxmin
:
df.loc[df.groupby('A').B.idxmin()]
A B C
2 1 2 10
4 2 4 4
df.loc[df.groupby('A').B.idxmin()].reset_index(drop=True)
A B C
0 1 2 10
1 2 4 4
pandas groupby ID and select row with minimal value of specific columns
Bkeesey's answer looks like it almost got you to your solution. I added one more step to get the overall minimum for each group.
import pandas as pd
# create sample df
df = pd.DataFrame({'ID': [1, 1, 2, 2, 3, 3],
'A': [30, 14, 100, 67, 1, 20],
'B': [10, 1, 2, 5, 100, 3],
'C': [1, 2, 3, 4, 5, 6],
})
# set "ID" as the index
df = df.set_index('ID')
# get the min for each column
mindf = df[['A','B']].groupby('ID').transform('min')
# get the min between columns and add it to df
df['min'] = mindf.apply(min, axis=1)
# filter df for when A or B matches the min
df2 = df.loc[(df['A'] == df['min']) | (df['B'] == df['min'])]
print(df2)
In my simplified example, I'm just finding the minimum between columns A and B. Here's the output:
A B C min
ID
1 14 1 2 1
2 100 2 3 2
3 1 100 5 1
Select all rows of dataframe that have a minimum value for a group
Use DataFrame.sort_values
+ DataFrame.drop_duplicates
.
df.sort_values(['date','time']).drop_duplicates(subset ='date')[['date','value']]
# date value
#1 1/12 13
#2 1/13 8
or
df.sort_values(['date','time']).groupby('date',as_index=False).first()[['date','value']]
# date value
# 0 1/12 13
# 1 1/13 8
Filter grouped pandas dataframe, keep all rows with minimum value in column
Let's try with groupby.transform
to get the minimum value of C per group and compare with df['C']
and keep those C
values that equal the minimum:
df.loc[df.groupby('A')['C'].transform('min').eq(df['C'])].reset_index(drop=True)
A B C
0 SAM 23 1
1 SAM 23 1
2 BILL 36 1
3 BILL 36 1
4 JIMMY 33 2
5 JIMMY 33 2
6 CARTER 25 3
7 GRACE 27 4
8 TOMMY 32 7
Groupby column keep multiple rows with minimum value
You are close, only need compare id2
column with transform
Series
and filter by boolean indexing
:
df = firstS[firstS['id2'] == firstS.groupby('id1')['id2'].transform(min)]
print (df)
id1 id2 num1
0 1 1 9
1 1 1 4
5 2 6 9
6 2 6 1
7 2 6 5
10 3 2 8
In pandas find row per group which is smallest value greater than value
Use idxmin
instead of min
to extract the index, then use loc
:
df.loc[df[df.B > 5].groupby('A')['B'].idxmin()]
Output:
A B
2 C1 8
6 C2 8
10 C3 7
Alternatively, you can use sort_values
followed by drop_duplicates
:
df[df.B > 5].sort_values('B').drop_duplicates('A')
Output:
A B
10 C3 7
2 C1 8
6 C2 8
Select only one with lowest value
Try this
df = pd.DataFrame({'Id': [1, 2, 3],
'Website': ['facebook', 'facebook', 'line'],
'Rank': [25, 5, 9]})
# index the min ranks of each website
df.loc[df.groupby('Website')['Rank'].idxmin()]
Id Website Rank
1 2 facebook 5
2 3 line 9
Related Topics
Check If a File Is Not Open Nor Being Used by Another Process
Limit Number of Threads in Numpy
Splitting a List Based on a Delimiter Word
How to Check the Versions of Python Modules
How to Find Numeric Columns in Pandas
How to Add Multiple Values to a Dictionary Key
How to Account for Period (Am/Pm) Using Strftime
Coalesce Values from 2 Columns into a Single Column in a Pandas Dataframe
How to Use the 'JSON' Module to Read in One JSON Object at a Time
Django Modelform for Many-To-Many Fields
Writing a Dict to Txt File and Reading It Back
Django: Improperlyconfigured: the Secret_Key Setting Must Not Be Empty
How to Prevent a C Shared Library to Print on Stdout in Python
Execute Terminal Command from Python in New Terminal Window