group by week in pandas
First, convert column date
to_datetime
and subtract one week as we want the sum for the week ahead of the date and not the week before that date.
Then use groupby
with Grouper
by W-MON and aggregate sum
:
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
.sum()
.reset_index()
.sort_values('Date')
print (df)
Name Date Quantity
0 Apple 2017-07-10 90
3 orange 2017-07-10 20
1 Apple 2017-07-17 30
2 Orange 2017-07-24 40
Pandas Group by date weekly
Use DataFrame.resample
by W
with sum
:
#convert date column to datetimes
df['date'] = pd.to_datetime(df['date'])
df1 = df.resample('W', on='date')['count1','count2'].sum()
Or use Grouper
:
df1 = df.groupby(pd.Grouper(freq='W', key='date'))['count1','count2'].sum()
print (df1)
count1 count2
date
2019-12-15 3 75
2019-12-22 4 43
Aggregate a Pandas Dataframe by week and month
Use resample
by days with aggregate size
first, then extract names of days by strftime
and last for weeks use resample per weeks with transform
first
and last
values:
df1 = df.resample('d', on='indate').size().reset_index(name='number of launchings')
df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.resample('W', on='indate')['indate']
df1['week'] = g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' +
g.transform('last').dt.strftime('%Y-%m-%d')
Another solution is use Grouper
:
df1 = (df.groupby(pd.Grouper(freq='d', key='indate'))
.size()
.reset_index(name='number of launchings'))
df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.groupby(pd.Grouper(freq='W', key='indate'))['indate']
df1['week'] = (g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' +
g.transform('last').dt.strftime('%Y-%m-%d'))
print (df1)
indate number of launchings day week
0 2016-12-19 2 Mon 2016-12-19 - 2016-12-25
1 2016-12-20 3 Tue 2016-12-19 - 2016-12-25
2 2016-12-21 4 Wed 2016-12-19 - 2016-12-25
3 2016-12-22 5 Thu 2016-12-19 - 2016-12-25
4 2016-12-23 1 Fri 2016-12-19 - 2016-12-25
5 2016-12-24 1 Sat 2016-12-19 - 2016-12-25
6 2016-12-25 1 Sun 2016-12-19 - 2016-12-25
7 2016-12-26 1 Mon 2016-12-26 - 2017-01-01
8 2016-12-27 1 Tue 2016-12-26 - 2017-01-01
9 2016-12-28 1 Wed 2016-12-26 - 2017-01-01
10 2016-12-29 1 Thu 2016-12-26 - 2017-01-01
11 2016-12-30 1 Fri 2016-12-26 - 2017-01-01
12 2016-12-31 1 Sat 2016-12-26 - 2017-01-01
13 2017-01-01 1 Sun 2016-12-26 - 2017-01-01
Sample data:
print (df)
indate
0 2016-12-19 12:16:00
1 2016-12-19 12:21:00
2 2016-12-20 12:32:00
3 2016-12-20 12:34:00
4 2016-12-20 12:40:00
5 2016-12-21 13:47:01
6 2016-12-21 14:27:01
7 2016-12-21 14:43:00
8 2016-12-21 15:02:00
9 2016-12-22 15:16:00
10 2016-12-22 15:22:00
11 2016-12-22 15:25:00
12 2016-12-22 15:22:00
13 2016-12-22 15:25:00
14 2016-12-23 12:16:00
15 2016-12-24 12:21:00
16 2016-12-25 12:32:00
17 2016-12-26 12:34:00
18 2016-12-27 12:40:00
19 2016-12-28 13:47:01
20 2016-12-29 14:27:01
21 2016-12-30 14:43:00
22 2016-12-31 15:02:00
23 2017-01-01 15:16:00
Pandas: how to aggregate data weekly?
Convert val
to numeric first and then remove []
around 'lat', 'lon'
:
df['val'] = pd.to_numeric(df['val'])
df['date'] = pd.to_datetime(df['date'])
df = (df.groupby(['lat', 'lon', pd.Grouper(key='date', freq='W-MON')])['val']
.mean()
.reset_index())
print (df)
lat lon date val
0 38.5437 -9.50659 2010-08-16 4.0
1 38.5437 -9.50659 2010-09-06 4.5
If need month periods and week of year:
df = df.groupby([df['date'].dt.to_period('m').rename('month'),
df['date'].dt.isocalendar().week.rename('week'),
'lat', 'lon'])['val'].mean().reset_index()
print (df)
month week lat lon val
0 2010-08 32 38.5437 -9.50659 4.0
1 2010-09 35 38.5437 -9.50659 4.5
Group data by week in Pandas
import pandas as pd
Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]
df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
x = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Name'].count()
print(x)
Aggregating weekly data by group into monthly sums in pandas
'Week' is not in the year_month format you need in your expected output, so you need to first convert them into year_month
by:
date = df['Week'].str.split(' ', expand=True)[0]
year_month = pd.to_datetime(date, errors='coerce').dt.strftime('%Y-%b').fillna(date)
before you use groupby
:
df.groupby([year_month, 'Clinic']).sum()
aggregate within a week
If need get counts per route_id
and weeks starting by Sunday first get counts and then for aggregate per route_it
use sum
:
print (df)
card_id route_id timestamp
0 3941139920 34 2022-04-19 04:00:03
1 32111423 1305 2022-04-29 04:00:15
2 3941139920 34 2022-04-23 04:00:03
3 32111423 1305 2022-04-25 04:00:15
4 3941139920 34 2022-04-26 04:00:03
5 32111423 1305 2022-04-27 04:00:15
6 3941139920 34 2022-04-25 04:00:03
7 32111423 1305 2022-04-21 04:00:15
print (df.groupby(['route_id', pd.Grouper(freq='W', key='timestamp')]).size())
route_id timestamp
34 2022-04-24 2
2022-05-01 2
1305 2022-04-24 1
2022-05-01 3
dtype: int64
df = (df.groupby(['route_id', pd.Grouper(freq='W', key='timestamp')])
.size()
.groupby(level=0).sum()
.reset_index(name='count'))
print (df)
route_id count
0 34 4
1 1305 4
How to aggregate a dataframe by week?
Just this once, after some research, I actually think I came up with a better solution that
- gives the correct aggregation
- gives the correct labels
Example below for weeks starting on a thursday. The weeks will be labeled by their first day a given cycle.
library(tidyverse)
library(lubridate)
options(tibble.print_min = 30)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),
day_of_week = wday(time, label = FALSE))
df2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),
tmp_1 = (as.integer(day_of_week) - 5),
tmp_2 = ((as.integer(day_of_week) - 5) %% 7))
which gives
> df2
# A tibble: 25 × 7
time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2
<date> <dbl> <ord> <dbl> <date> <dbl> <dbl>
1 2014-02-24 30 Mon 2 2014-02-20 -3 4
2 2014-02-25 45 Tues 3 2014-02-20 -2 5
3 2014-02-26 30 Wed 4 2014-02-20 -1 6
4 2014-02-27 50 Thurs 5 2014-02-27 0 0
5 2014-02-28 50 Fri 6 2014-02-27 1 1
6 2014-03-01 20 Sat 7 2014-02-27 2 2
7 2014-03-02 35 Sun 1 2014-02-27 -4 3
8 2014-03-03 50 Mon 2 2014-02-27 -3 4
9 2014-03-04 35 Tues 3 2014-02-27 -2 5
10 2014-03-05 35 Wed 4 2014-02-27 -1 6
11 2014-03-06 50 Thurs 5 2014-03-06 0 0
12 2014-03-07 35 Fri 6 2014-03-06 1 1
13 2014-03-08 40 Sat 7 2014-03-06 2 2
14 2014-03-09 40 Sun 1 2014-03-06 -4 3
15 2014-03-10 20 Mon 2 2014-03-06 -3 4
16 2014-03-11 50 Tues 3 2014-03-06 -2 5
17 2014-03-12 25 Wed 4 2014-03-06 -1 6
18 2014-03-13 20 Thurs 5 2014-03-13 0 0
19 2014-03-14 30 Fri 6 2014-03-13 1 1
20 2014-03-15 50 Sat 7 2014-03-13 2 2
21 2014-03-16 50 Sun 1 2014-03-13 -4 3
22 2014-03-17 40 Mon 2 2014-03-13 -3 4
23 2014-03-18 40 Tues 3 2014-03-13 -2 5
24 2014-03-19 50 Wed 4 2014-03-13 -1 6
25 2014-03-20 40 Thurs 5 2014-03-20 0 0
R aggregate by week
To group by the ISO definition of weeks, use
require(tidyverse)
df %>%
group_by(year = year(date), week = week(date)) %>%
summarise_if(is.numeric, sum)
To group by weeks starting on Sunday, use @r2evans suggestion
require(tidyverse)
df %>%
group_by(week = format(date, '%Y-%U'))%>%
summarise_if(is.numeric, sum)
Related Topics
Geom_Text How to Position the Text on Bar as I Want
Sort a Data.Table Fast by Ascending/Descending Order
Text Clustering with Levenshtein Distances
Access and Preserve List Names in Lapply Function
How to Show the Y Value on Tooltip While Hover in Ggplot2
How to Add Frequency Count Labels to the Bars in a Bar Graph Using Ggplot2
Time Out an R Command via Something Like Try()
How to Aggregate a Dataframe by Week
How to Draw the Boxplot with Significant Level
R Strsplit with Multiple Unordered Split Arguments
Setting Document Title in Rmarkdown from Parameters
Rmarkdown: How to Change the Font Color
Using Cbind on an Arbitrarily Long List of Objects