Keeping only rows with most recent date in dataframe
This can be done by sort_values
& drop_duplicates
:
df = df.sort_values(by=['Modified Date'], ascending=False)
df = drop_duplicates(subset='School ID', keep='first)
Where the sort ensures that for each school the newest date will appear first, and the drop duplicates takes the first appearance of each school, which is the newest.
group by pandas dataframe and select latest in each group
use idxmax
in groupby
and slice df
with loc
df.loc[df.groupby('id').date.idxmax()]
id product date
2 220 6647 2014-10-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01
Pandas groupby a column and sort by date and get only the latest row
If date
has higher precendence than content_id
, use that fact in sort_values
:
out = df.sort_values(['user_id','date','content_id']).groupby(['user_id'])[['content_id','date']].last()
Another possibility is to convert date
to datetime and the find the latest date's index using groupby
+ idxmax
; then use loc
to filter the desired output:
df['date'] = pd.to_datetime(df['date'])
out = df.loc[df.groupby('user_id')['date'].idxmax()]
Output:
content_id date
user_id
123 20 2020-10-14
234 19 2021-05-26
Filter for most recent event by group with pandas
It seems the sale_date
column has strings. If you convert it to datetime dtype, then you can use groupby
+ idxmax
:
df['sale_date'] = pd.to_datetime(df['sale_date'])
out = df.loc[df.groupby('account_number')['sale_date'].idxmax()]
Output:
account_number product sale_date
3 123 sale 2022-01-01
1 423 rental 2021-10-01
4 513 sale 2021-11-30
Select all rows with 2 most recent dates by ID
You can try groupby().nth
:
df[df['date']>=df.groupby("id")["date"].transform('nth', n=2)]
Output:
id date value1 value2
0 a 2020-12-07 10 1000
1 a 2020-12-07 10 1000
2 a 2020-12-05 10 1000
3 a 2020-12-05 10 1000
6 b 2021-12-07 20 2000
7 b 2021-12-07 20 2000
8 b 2021-09-05 20 2000
9 b 2021-09-05 20 2000
12 c 2021-09-05 30 3000
13 c 2021-09-05 30 3000
14 c 2021-02-05 30 3000
15 c 2021-02-05 30 3000
Group By Customer Id and Also Take Date Column With Most Recent Value In Pandas
I think it would be easier to just sort them by date and then drop the duplicates.
df = df.sort_values('date_cancelled', ascending=False)
df = df.drop_duplicates(subset='owner_id', keep='first')
print(df)
Related Topics
How to Extract Data from Text Field in Pandas Dataframe
Login to a Website Using Script
When to Use Cla(), Clf() or Close() for Clearing a Plot in Matplotlib
How to Stop Execution of All Cells in Jupyter Notebook
How to Delete Comma At the End of the Output in Python
Use Cumcount on Pandas Dataframe With a Conditional Increment
Json Valueerror: Expecting Property Name: Line 1 Column 2 (Char 1)
How to Find the Longest Word in a Text File
Python Json Serialize a Decimal Object
How to Find the Closest Values in a Pandas Series to an Input Number
Pandas: Group by Name and Take Row With Most Recent Date
Dice Rolling Simulator in Python
Find the Longest Substring in Alphabetical Order
How to Count Values Greater Than the Group Mean in Pandas
Maximum Characters That Can Be Stuffed into Raw_Input() in Python
Visual Studio Code Pylint: Unable to Import 'Protorpc'
How to Do This Horizontally Instead of Vertically in Python
How to Change Dd-Mm-Yyyy Date Format to Yyyy-Dd-Mm in Pandas