Groupby and filter by max value in pandas
You can do this:
latest = df.query('Value==1').groupby("ID").max("year").assign(Latest = "Latest")
pd.merge(df,latest,how="outer")
Value ID Date Latest
0 1 5 2012 NaN
1 1 5 2013 Latest
2 0 12 2017 NaN
3 0 12 2022 NaN
4 1 27 2005 NaN
5 1 27 2011 Latest
Python Pandas Dataframe select row by max value in group
A standard approach is to use groupby(keys)[column].idxmax()
.
However, to select the desired rows using idxmax
you need idxmax
to return unique index values. One way to obtain a unique index is to call reset_index
.
Once you obtain the index values from groupby(keys)[column].idxmax()
you can then select the entire row using df.loc
:
In [20]: df.loc[df.reset_index().groupby(['F_Type'])['to_date'].idxmax()]
Out[20]:
start end
F_Type to_date
A 20150908143000 345 316
B 20150908143000 10743 8803
C 20150908143000 19522 16659
D 20150908143000 433 65
E 20150908143000 7290 7375
F 20150908143000 0 0
G 20150908143000 1796 340
Note: idxmax
returns index labels, not necessarily ordinals. After using reset_index
the index labels happen to also be ordinals, but since idxmax
is returning labels (not ordinals) it is better to always use idxmax
in conjunction with df.loc
, not df.iloc
(as I originally did in this post.)
How to select row with max value in column from pandas groupby() groups?
You can do this by combining this answer with a groupby to get the list of stores they have worked at.
# Get stores that each person works at
stores_for_each_name = df.groupby('name')['store'].apply(','.join)
# Get row with largest order value for each name
df = df.sort_values('orders', ascending=False).drop_duplicates('name').rename({'orders': 'max_orders'}, axis=1)
# Replace store column with comma-separated list of stores they have worked at
df = df.drop('store', axis=1)
df = df.join(stores_for_each_name, on='name')
Output:
name stuff max_orders store
3 bob xcxfcd 5 A
1 ann dsdfds 3 A,C
4 john uityuu 3 A,B,C
get rows with largest value in grouping
Use DataFrameGroupBy.idxmax
if need select only one max value:
df = df.loc[df.groupby('id')['value'].idxmax()]
print (df)
id other_value value
2 1 b 5
5 2 d 6
7 3 f 4
10 4 e 7
If multiple max values and want seelct all rows by max
values:
df = pd.DataFrame({'id' : [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4],
'other_value' : ['a', 'e', 'b', 'b', 'a', 'd', 'b', 'f' ,'a' ,'c', 'e', 'f'],
'value' : [1, 3, 5, 2, 5, 6, 2, 4, 6, 1, 7, 7]
})
print (df)
id other_value value
0 1 a 1
1 1 e 3
2 1 b 5
3 2 b 2
4 2 a 5
5 2 d 6
6 3 b 2
7 3 f 4
8 4 a 6
9 4 c 1
10 4 e 7
11 4 f 7
df = df[df.groupby('id')['value'].transform('max') == df['value']]
print (df)
id other_value value
2 1 b 5
5 2 d 6
7 3 f 4
10 4 e 7
11 4 f 7
how do you fill row values of a column groupby with the max value of the grouped data
You can use a group by in combination with a transform "max." I'm not sure if you would simply want to replace the 'fail' column or if you would want to make a new column but this should get you the expected results.
df['fail'] = df.groupby(['Cow', 'Lact'])['fail'].transform(max)
Related Topics
Deal With Overflow in Exp Using Numpy
How to Count the Total Number of Words in a Pandas Dataframe Cell and Add Those to a New Column
How to Install Pip for a Specific Python Version
How to Name a File by a Variable Name in Python
Redirect Command Line Results to a Tkinter Gui
Decode Utf-8 Encoding in Json String
How to Find Duration Between Two Time Difference in Python Dataframe
How to Remove Text Within Parentheses With a Regex
What Is the Simplest Way to Ssh Using Python
Simple Digit Recognition Ocr in Opencv-Python
How to Determine If My Python Shell Is Executing in 32Bit or 64Bit
Remove Partial String from Dataframe With Pandas
How to Extract a Value (I Want an Int Not Row) from a Dataframe and Do Simple Calculations on It
Pandas Merge - How to Avoid Duplicating Columns
How to Skip Specific Indexes in an Array
How to Suppress Scientific Notation When Printing Float Values
How to Make Python Get the Username in Windows and Then Implement It in a Script