Pandas Groupby and Select Rows with the Minimum Value in a Specific Column

Pandas GroupBy and select rows with the minimum value in a specific column

I feel like you're overthinking this. Just use groupby and idxmin:

df.loc[df.groupby('A').B.idxmin()]

A B C
2 1 2 10
4 2 4 4

df.loc[df.groupby('A').B.idxmin()].reset_index(drop=True)

A B C
0 1 2 10
1 2 4 4

pandas groupby ID and select row with minimal value of specific columns

Bkeesey's answer looks like it almost got you to your solution. I added one more step to get the overall minimum for each group.

import pandas as pd

# create sample df
df = pd.DataFrame({'ID': [1, 1, 2, 2, 3, 3],
'A': [30, 14, 100, 67, 1, 20],
'B': [10, 1, 2, 5, 100, 3],
'C': [1, 2, 3, 4, 5, 6],
})

# set "ID" as the index
df = df.set_index('ID')

# get the min for each column
mindf = df[['A','B']].groupby('ID').transform('min')

# get the min between columns and add it to df
df['min'] = mindf.apply(min, axis=1)

# filter df for when A or B matches the min
df2 = df.loc[(df['A'] == df['min']) | (df['B'] == df['min'])]

print(df2)

In my simplified example, I'm just finding the minimum between columns A and B. Here's the output:

      A    B  C  min
ID
1 14 1 2 1
2 100 2 3 2
3 1 100 5 1

Select all rows of dataframe that have a minimum value for a group

Use DataFrame.sort_values + DataFrame.drop_duplicates.

df.sort_values(['date','time']).drop_duplicates(subset ='date')[['date','value']]
# date value
#1 1/12 13
#2 1/13 8

or

df.sort_values(['date','time']).groupby('date',as_index=False).first()[['date','value']]
# date value
# 0 1/12 13
# 1 1/13 8

Filter grouped pandas dataframe, keep all rows with minimum value in column

Let's try with groupby.transform to get the minimum value of C per group and compare with df['C'] and keep those C values that equal the minimum:

df.loc[df.groupby('A')['C'].transform('min').eq(df['C'])].reset_index(drop=True)
        A   B  C
0 SAM 23 1
1 SAM 23 1
2 BILL 36 1
3 BILL 36 1
4 JIMMY 33 2
5 JIMMY 33 2
6 CARTER 25 3
7 GRACE 27 4
8 TOMMY 32 7

Groupby column keep multiple rows with minimum value

You are close, only need compare id2 column with transform Series and filter by boolean indexing:

df = firstS[firstS['id2'] == firstS.groupby('id1')['id2'].transform(min)]
print (df)
id1 id2 num1
0 1 1 9
1 1 1 4
5 2 6 9
6 2 6 1
7 2 6 5
10 3 2 8

In pandas find row per group which is smallest value greater than value

Use idxmin instead of min to extract the index, then use loc:

df.loc[df[df.B > 5].groupby('A')['B'].idxmin()]

Output:

     A  B
2 C1 8
6 C2 8
10 C3 7

Alternatively, you can use sort_values followed by drop_duplicates:

df[df.B > 5].sort_values('B').drop_duplicates('A')

Output:

     A  B
10 C3 7
2 C1 8
6 C2 8

Select only one with lowest value

Try this

df = pd.DataFrame({'Id': [1, 2, 3],
'Website': ['facebook', 'facebook', 'line'],
'Rank': [25, 5, 9]})
# index the min ranks of each website
df.loc[df.groupby('Website')['Rank'].idxmin()]
   Id   Website  Rank
1 2 facebook 5
2 3 line 9


Related Topics



Leave a reply



Submit