How to Aggregate a Dataframe by Week

group by week in pandas

First, convert column date to_datetime and subtract one week as we want the sum for the week ahead of the date and not the week before that date.

Then use groupby with Grouper by W-MON and aggregate sum:

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
.sum()
.reset_index()
.sort_values('Date')
print (df)
  Name       Date  Quantity
0 Apple 2017-07-10 90
3 orange 2017-07-10 20
1 Apple 2017-07-17 30
2 Orange 2017-07-24 40

Pandas Group by date weekly

Use DataFrame.resample by W with sum:

#convert date column to datetimes
df['date'] = pd.to_datetime(df['date'])

df1 = df.resample('W', on='date')['count1','count2'].sum()

Or use Grouper:

df1 = df.groupby(pd.Grouper(freq='W', key='date'))['count1','count2'].sum()

print (df1)
count1 count2
date
2019-12-15 3 75
2019-12-22 4 43

Aggregate a Pandas Dataframe by week and month

Use resample by days with aggregate size first, then extract names of days by strftime and last for weeks use resample per weeks with transform first and last values:

df1 = df.resample('d', on='indate').size().reset_index(name='number of launchings')

df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.resample('W', on='indate')['indate']
df1['week'] = g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' +
g.transform('last').dt.strftime('%Y-%m-%d')

Another solution is use Grouper:

df1 = (df.groupby(pd.Grouper(freq='d', key='indate'))
.size()
.reset_index(name='number of launchings'))

df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.groupby(pd.Grouper(freq='W', key='indate'))['indate']
df1['week'] = (g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' +
g.transform('last').dt.strftime('%Y-%m-%d'))
print (df1)

       indate  number of launchings  day                     week
0 2016-12-19 2 Mon 2016-12-19 - 2016-12-25
1 2016-12-20 3 Tue 2016-12-19 - 2016-12-25
2 2016-12-21 4 Wed 2016-12-19 - 2016-12-25
3 2016-12-22 5 Thu 2016-12-19 - 2016-12-25
4 2016-12-23 1 Fri 2016-12-19 - 2016-12-25
5 2016-12-24 1 Sat 2016-12-19 - 2016-12-25
6 2016-12-25 1 Sun 2016-12-19 - 2016-12-25
7 2016-12-26 1 Mon 2016-12-26 - 2017-01-01
8 2016-12-27 1 Tue 2016-12-26 - 2017-01-01
9 2016-12-28 1 Wed 2016-12-26 - 2017-01-01
10 2016-12-29 1 Thu 2016-12-26 - 2017-01-01
11 2016-12-30 1 Fri 2016-12-26 - 2017-01-01
12 2016-12-31 1 Sat 2016-12-26 - 2017-01-01
13 2017-01-01 1 Sun 2016-12-26 - 2017-01-01

Sample data:

print (df)
indate
0 2016-12-19 12:16:00
1 2016-12-19 12:21:00
2 2016-12-20 12:32:00
3 2016-12-20 12:34:00
4 2016-12-20 12:40:00
5 2016-12-21 13:47:01
6 2016-12-21 14:27:01
7 2016-12-21 14:43:00
8 2016-12-21 15:02:00
9 2016-12-22 15:16:00
10 2016-12-22 15:22:00
11 2016-12-22 15:25:00
12 2016-12-22 15:22:00
13 2016-12-22 15:25:00
14 2016-12-23 12:16:00
15 2016-12-24 12:21:00
16 2016-12-25 12:32:00
17 2016-12-26 12:34:00
18 2016-12-27 12:40:00
19 2016-12-28 13:47:01
20 2016-12-29 14:27:01
21 2016-12-30 14:43:00
22 2016-12-31 15:02:00
23 2017-01-01 15:16:00

Pandas: how to aggregate data weekly?

Convert val to numeric first and then remove [] around 'lat', 'lon':

df['val'] = pd.to_numeric(df['val'])

df['date'] = pd.to_datetime(df['date'])

df = (df.groupby(['lat', 'lon', pd.Grouper(key='date', freq='W-MON')])['val']
.mean()
.reset_index())
print (df)
lat lon date val
0 38.5437 -9.50659 2010-08-16 4.0
1 38.5437 -9.50659 2010-09-06 4.5

If need month periods and week of year:

df = df.groupby([df['date'].dt.to_period('m').rename('month'), 
df['date'].dt.isocalendar().week.rename('week'),
'lat', 'lon'])['val'].mean().reset_index()
print (df)
month week lat lon val
0 2010-08 32 38.5437 -9.50659 4.0
1 2010-09 35 38.5437 -9.50659 4.5

Group data by week in Pandas


import pandas as pd 

Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]

df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
x = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Name'].count()
print(x)

Aggregating weekly data by group into monthly sums in pandas

'Week' is not in the year_month format you need in your expected output, so you need to first convert them into year_month by:

date = df['Week'].str.split(' ', expand=True)[0]
year_month = pd.to_datetime(date, errors='coerce').dt.strftime('%Y-%b').fillna(date)

before you use groupby:

df.groupby([year_month, 'Clinic']).sum()

aggregate within a week

If need get counts per route_id and weeks starting by Sunday first get counts and then for aggregate per route_it use sum:

print (df)
card_id route_id timestamp
0 3941139920 34 2022-04-19 04:00:03
1 32111423 1305 2022-04-29 04:00:15
2 3941139920 34 2022-04-23 04:00:03
3 32111423 1305 2022-04-25 04:00:15
4 3941139920 34 2022-04-26 04:00:03
5 32111423 1305 2022-04-27 04:00:15
6 3941139920 34 2022-04-25 04:00:03
7 32111423 1305 2022-04-21 04:00:15

print (df.groupby(['route_id', pd.Grouper(freq='W', key='timestamp')]).size())
route_id timestamp
34 2022-04-24 2
2022-05-01 2
1305 2022-04-24 1
2022-05-01 3
dtype: int64

df = (df.groupby(['route_id', pd.Grouper(freq='W', key='timestamp')])
.size()
.groupby(level=0).sum()
.reset_index(name='count'))

print (df)
route_id count
0 34 4
1 1305 4

How to aggregate a dataframe by week?

Just this once, after some research, I actually think I came up with a better solution that

  • gives the correct aggregation
  • gives the correct labels

Example below for weeks starting on a thursday. The weeks will be labeled by their first day a given cycle.

library(tidyverse)
library(lubridate)
options(tibble.print_min = 30)

time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)

df2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),
day_of_week = wday(time, label = FALSE))

df2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),
tmp_1 = (as.integer(day_of_week) - 5),
tmp_2 = ((as.integer(day_of_week) - 5) %% 7))

which gives

> df2
# A tibble: 25 × 7
time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2
<date> <dbl> <ord> <dbl> <date> <dbl> <dbl>
1 2014-02-24 30 Mon 2 2014-02-20 -3 4
2 2014-02-25 45 Tues 3 2014-02-20 -2 5
3 2014-02-26 30 Wed 4 2014-02-20 -1 6
4 2014-02-27 50 Thurs 5 2014-02-27 0 0
5 2014-02-28 50 Fri 6 2014-02-27 1 1
6 2014-03-01 20 Sat 7 2014-02-27 2 2
7 2014-03-02 35 Sun 1 2014-02-27 -4 3
8 2014-03-03 50 Mon 2 2014-02-27 -3 4
9 2014-03-04 35 Tues 3 2014-02-27 -2 5
10 2014-03-05 35 Wed 4 2014-02-27 -1 6
11 2014-03-06 50 Thurs 5 2014-03-06 0 0
12 2014-03-07 35 Fri 6 2014-03-06 1 1
13 2014-03-08 40 Sat 7 2014-03-06 2 2
14 2014-03-09 40 Sun 1 2014-03-06 -4 3
15 2014-03-10 20 Mon 2 2014-03-06 -3 4
16 2014-03-11 50 Tues 3 2014-03-06 -2 5
17 2014-03-12 25 Wed 4 2014-03-06 -1 6
18 2014-03-13 20 Thurs 5 2014-03-13 0 0
19 2014-03-14 30 Fri 6 2014-03-13 1 1
20 2014-03-15 50 Sat 7 2014-03-13 2 2
21 2014-03-16 50 Sun 1 2014-03-13 -4 3
22 2014-03-17 40 Mon 2 2014-03-13 -3 4
23 2014-03-18 40 Tues 3 2014-03-13 -2 5
24 2014-03-19 50 Wed 4 2014-03-13 -1 6
25 2014-03-20 40 Thurs 5 2014-03-20 0 0

R aggregate by week

To group by the ISO definition of weeks, use

require(tidyverse)
df %>%
group_by(year = year(date), week = week(date)) %>%
summarise_if(is.numeric, sum)

To group by weeks starting on Sunday, use @r2evans suggestion

require(tidyverse)
df %>%
group_by(week = format(date, '%Y-%U'))%>%
summarise_if(is.numeric, sum)


Related Topics



Leave a reply



Submit