group by pandas dataframe and select latest in each group
use idxmax
in groupby
and slice df
with loc
df.loc[df.groupby('id').date.idxmax()]
id product date
2 220 6647 2014-10-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01
group by pandas dataframe and select next upcoming date in each group
Filter the dates first, then drop duplicates:
df[df['date']>'2020-12-01'].sort_values(['id','date']).drop_duplicates('id')
Output:
id product date
2 220 6647 2020-12-16
4 826 3380 2020-12-09
8 901 4555 2021-11-01
Get only the first and last rows of each group with pandas
Use groupby
, find the head
and tail
for each group, and concat
the two.
g = df.groupby('ID')
(pd.concat([g.head(1), g.tail(1)])
.drop_duplicates()
.sort_values('ID')
.reset_index(drop=True))
Time ID X Y
0 8:00 A 23 100
1 20:00 A 35 220
2 9:00 B 24 110
3 23:00 B 38 250
4 11:00 C 26 130
5 22:00 C 37 240
6 15:00 D 30 170
If you can guarantee each ID group has at least two rows, the drop_duplicates
call is not needed.
Details
g.head(1)
Time ID X Y
0 8:00 A 23 100
1 9:00 B 24 110
3 11:00 C 26 130
7 15:00 D 30 170
g.tail(1)
Time ID X Y
7 15:00 D 30 170
12 20:00 A 35 220
14 22:00 C 37 240
15 23:00 B 38 250
pd.concat([g.head(1), g.tail(1)])
Time ID X Y
0 8:00 A 23 100
1 9:00 B 24 110
3 11:00 C 26 130
7 15:00 D 30 170
7 15:00 D 30 170
12 20:00 A 35 220
14 22:00 C 37 240
15 23:00 B 38 250
Pandas groupby select last row or second to last row based on value (0 or 1) in another column
You are checking if x['churned']==1
for all rows in the group. To check if it presents in the group you have to use any()
:
df = df.groupby(['CustomerID'],as_index=False).apply \
(lambda x: x.iloc[-2] if (x['churned']==1).any() \
else x.iloc[-1]).reset_index()
Pandas get topmost n records within each group
Did you try
df.groupby('id').head(2)
Output generated:
id value
id
1 0 1 1
1 1 2
2 3 2 1
4 2 2
3 7 3 1
4 8 4 1
(Keep in mind that you might need to order/sort before, depending on your data)
EDIT: As mentioned by the questioner, use
df.groupby('id').head(2).reset_index(drop=True)
to remove the MultiIndex and flatten the results:
id value
0 1 1
1 1 2
2 2 1
3 2 2
4 3 1
5 4 1
Keep X% last rows by group in Pandas
groupby
-apply
-tail
Pass the desired size to tail()
in a GroupBy.apply()
. This is simpler than the iloc
method below since it cleanly handles the "last 0 rows" case.
ratio = 0.6
(df.groupby('ID')
.apply(lambda x: x.tail(int(ratio * len(x))))
.reset_index(drop=True))
# ID value
# 0 A 2
# 1 B 13
# 2 B 14
# 3 B 15
ratio = 0.4
(df.groupby('ID')
.apply(lambda x: x.tail(int(ratio * len(x))))
.reset_index(drop=True))
# ID value
# 0 B 14
# 1 B 15
groupby
-apply
-iloc
Alternatively, index the desired size via iloc
/slicing, but this is clunkier since [-0:]
does not actually get the last 0 rows, so we have to check against that:
ratio = 0.6
(df.groupby('ID')
.apply(lambda x: x[-int(ratio * len(x)):] if int(ratio * len(x)) else None)
.reset_index(drop=True))
# ID value
# 0 A 2
# 1 B 13
# 2 B 14
# 3 B 15
ratio = 0.4
(df.groupby('ID')
.apply(lambda x: x[-int(ratio * len(x)):] if int(ratio * len(x)) else None)
.reset_index(drop=True))
# ID value
# 0 B 14
# 1 B 15
Pandas groupby a column and sort by date and get only the latest row
If date
has higher precendence than content_id
, use that fact in sort_values
:
out = df.sort_values(['user_id','date','content_id']).groupby(['user_id'])[['content_id','date']].last()
Another possibility is to convert date
to datetime and the find the latest date's index using groupby
+ idxmax
; then use loc
to filter the desired output:
df['date'] = pd.to_datetime(df['date'])
out = df.loc[df.groupby('user_id')['date'].idxmax()]
Output:
content_id date
user_id
123 20 2020-10-14
234 19 2021-05-26
Filter for most recent event by group with pandas
It seems the sale_date
column has strings. If you convert it to datetime dtype, then you can use groupby
+ idxmax
:
df['sale_date'] = pd.to_datetime(df['sale_date'])
out = df.loc[df.groupby('account_number')['sale_date'].idxmax()]
Output:
account_number product sale_date
3 123 sale 2022-01-01
1 423 rental 2021-10-01
4 513 sale 2021-11-30
Related Topics
How to Get 2.X-Like Sorting Behaviour in Python 3.X
How to Load Existing Db File to Memory in Python SQLite3
"Unorderable Types: Int() < Str()"
How to Join Two Wav Files Using Python
Typeerror: Str Does Not Support Buffer Interface
Importing Pyspark in Python Shell
Calculation Error with Pow Operator
Configuring Spark to Work with Jupyter Notebook and Anaconda
How to Do a Not Equal in Django Queryset Filtering
Boto3 to Download All Files from a S3 Bucket
How to Extract Parameters from a List and Pass Them to a Function Call
Inverse Distance Weighted (Idw) Interpolation with Python
Calculating Difference Between Two Rows in Python/Pandas
How to Open a File for Exclusive Access in Python
Inheritance of Private and Protected Methods in Python
Type Hint for a Function That Returns Only a Specific Set of Values