Python Pandas Group by date using datetime data
resample
df.resample('D', on='Date_Time').mean()
B
Date_Time
2001-10-01 4.5
2001-10-02 6.0
Grouper
As suggested by @JosephCottam
df.set_index('Date_Time').groupby(pd.Grouper(freq='D')).mean()
B
Date_Time
2001-10-01 4.5
2001-10-02 6.0
Deprecated uses of TimeGrouper
You can set the index to be 'Date_Time'
and use pd.TimeGrouper
df.set_index('Date_Time').groupby(pd.TimeGrouper('D')).mean().dropna()
B
Date_Time
2001-10-01 4.5
2001-10-02 6.0
How to group pandas DataFrame entries by date in a non-unique column
I'm using pandas 0.16.2. This has better performance on my large dataset:
data.groupby(data.date.dt.year)
Using the dt
option and playing around with weekofyear
, dayofweek
etc. becomes far easier.
Pandas dataframe groupby datetime month
Managed to do it:
b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])
Or
b.groupby(pd.Grouper(freq='M')) # update for v0.21+
Grouping by date range (timedelta) with Pandas
You can use a groupby
with a custom group:
# convert to datetime
s = pd.to_datetime(df['date'], dayfirst=False)
# set up groups of consecutive dates within ± 3 days
group = (s.groupby(df['user_id'])
.apply(lambda s: s.diff().abs().gt('3days').cumsum())
)
# group by ID and new group and aggregate
out = (df.groupby(['user_id', group], as_index=False)
.agg({'date': 'last', 'val': 'sum'})
)
output:
user_id date val
0 1 1-2-17 3
1 2 1-2-17 2
2 2 1-10-17 1
3 3 1-1-17 1
4 3 2-5-17 8
intermediates (sorted by user_id
for clarity):
user_id date val datetime diff abs >3days cumsum
0 1 1-1-17 1 2017-01-01 NaT NaT False 0
3 1 1-1-17 1 2017-01-01 0 days 0 days False 0
4 1 1-2-17 1 2017-01-02 1 days 1 days False 0
1 2 1-1-17 1 2017-01-01 NaT NaT False 0
5 2 1-2-17 1 2017-01-02 1 days 1 days False 0
6 2 1-10-17 1 2017-01-10 8 days 8 days True 1
2 3 1-1-17 1 2017-01-01 NaT NaT False 0
7 3 2-1-17 1 2017-02-01 31 days 31 days True 1
8 3 2-2-17 1 2017-02-02 1 days 1 days False 1
9 3 2-3-17 2 2017-02-03 1 days 1 days False 1
10 3 2-4-17 3 2017-02-04 1 days 1 days False 1
11 3 2-5-17 1 2017-02-05 1 days 1 days False 1
How to group by multiple columns and with only date part of datetime in pandas
Here you go:
# df["Date & Time"] = pd.to_datetime(df["Date & Time"]) # If not already datetime
df.set_index(["CELL ID", "Party", df["Date & Time"].dt.date])
Output:
LOC Date & Time
CELL ID Party Date & Time
10631 3009787 2021-10-01 bwp 2021-10-01 08:20:30
2021-10-01 bwp 2021-10-01 08:40:50
50987 2275172 2021-10-02 bwp 2021-10-02 07:50:20
2021-10-02 bwp 2021-10-02 07:23:16
Pandas groupby month and year (date as datetime64[ns]) and summarized by count
you can groupby
and get the dt.year and the dt.month_name from the column date.
print (df.groupby([df['date'].dt.year.rename('year'),
df['date'].dt.month_name().rename('month')])
['rides'].sum().reset_index())
year month rides
0 2019 January 2596765
1 2020 March 880003
group by week in pandas
First, convert column date
to_datetime
and subtract one week as we want the sum for the week ahead of the date and not the week before that date.
Then use groupby
with Grouper
by W-MON and aggregate sum
:
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
.sum()
.reset_index()
.sort_values('Date')
print (df)
Name Date Quantity
0 Apple 2017-07-10 90
3 orange 2017-07-10 20
1 Apple 2017-07-17 30
2 Orange 2017-07-24 40
Pandas groupby a column and sort by date and get only the latest row
If date
has higher precendence than content_id
, use that fact in sort_values
:
out = df.sort_values(['user_id','date','content_id']).groupby(['user_id'])[['content_id','date']].last()
Another possibility is to convert date
to datetime and the find the latest date's index using groupby
+ idxmax
; then use loc
to filter the desired output:
df['date'] = pd.to_datetime(df['date'])
out = df.loc[df.groupby('user_id')['date'].idxmax()]
Output:
content_id date
user_id
123 20 2020-10-14
234 19 2021-05-26
Related Topics
How to Normalize JSON Correctly by Python Pandas
Get Column Index from Column Name in Python Pandas
Format Strings VS Concatenation
Unbalanced Data and Weighted Cross Entropy
Requests: How to Disable/Bypass Proxy
How to Use If/Else in a Dictionary Comprehension
Pymongo Keeps Refusing the Connection at 27017
How to Efficiently Process a Numpy Array in Blocks Similar to Matlab's Blkproc (Blockproc) Function
Download File Using Partial Download (Http)
Import Module Works in Terminal But Not in Idle
Remove Non-Ascii Characters from Pandas Column
Creating a Class Within a Function and Access a Function Defined in the Containing Function's Scope
How to Remove Leading Whitespace in Python
Removing List of Words from a String
Default Filter in Django Admin
How to Use Python Numpy.Savetxt to Write Strings and Float Number to an Ascii File