Aggregate by Week in R

R aggregate by week

To group by the ISO definition of weeks, use

require(tidyverse)
df %>%
group_by(year = year(date), week = week(date)) %>%
summarise_if(is.numeric, sum)

To group by weeks starting on Sunday, use @r2evans suggestion

require(tidyverse)
df %>%
group_by(week = format(date, '%Y-%U'))%>%
summarise_if(is.numeric, sum)

How to aggregate a dataframe by week?

Just this once, after some research, I actually think I came up with a better solution that

  • gives the correct aggregation
  • gives the correct labels

Example below for weeks starting on a thursday. The weeks will be labeled by their first day a given cycle.

library(tidyverse)
library(lubridate)
options(tibble.print_min = 30)

time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)

df2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),
day_of_week = wday(time, label = FALSE))

df2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),
tmp_1 = (as.integer(day_of_week) - 5),
tmp_2 = ((as.integer(day_of_week) - 5) %% 7))

which gives

> df2
# A tibble: 25 × 7
time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2
<date> <dbl> <ord> <dbl> <date> <dbl> <dbl>
1 2014-02-24 30 Mon 2 2014-02-20 -3 4
2 2014-02-25 45 Tues 3 2014-02-20 -2 5
3 2014-02-26 30 Wed 4 2014-02-20 -1 6
4 2014-02-27 50 Thurs 5 2014-02-27 0 0
5 2014-02-28 50 Fri 6 2014-02-27 1 1
6 2014-03-01 20 Sat 7 2014-02-27 2 2
7 2014-03-02 35 Sun 1 2014-02-27 -4 3
8 2014-03-03 50 Mon 2 2014-02-27 -3 4
9 2014-03-04 35 Tues 3 2014-02-27 -2 5
10 2014-03-05 35 Wed 4 2014-02-27 -1 6
11 2014-03-06 50 Thurs 5 2014-03-06 0 0
12 2014-03-07 35 Fri 6 2014-03-06 1 1
13 2014-03-08 40 Sat 7 2014-03-06 2 2
14 2014-03-09 40 Sun 1 2014-03-06 -4 3
15 2014-03-10 20 Mon 2 2014-03-06 -3 4
16 2014-03-11 50 Tues 3 2014-03-06 -2 5
17 2014-03-12 25 Wed 4 2014-03-06 -1 6
18 2014-03-13 20 Thurs 5 2014-03-13 0 0
19 2014-03-14 30 Fri 6 2014-03-13 1 1
20 2014-03-15 50 Sat 7 2014-03-13 2 2
21 2014-03-16 50 Sun 1 2014-03-13 -4 3
22 2014-03-17 40 Mon 2 2014-03-13 -3 4
23 2014-03-18 40 Tues 3 2014-03-13 -2 5
24 2014-03-19 50 Wed 4 2014-03-13 -1 6
25 2014-03-20 40 Thurs 5 2014-03-20 0 0

Aggregate data by week, month etc in R

The answer mentioned suggest that you should to use xts package.

library(xts)
## create you zoo objects using your data
## you replace text argument by read.zoo(yourfile, header = TRUE)
x.zoo <- read.zoo(text=' Data Kg
+ 1 2013-03-01 271
+ 2 2013-03-06 374
+ 3 2013-03-07 51
+ 4 2013-03-12 210
+ 5 2013-03-13 698
+ 6 2013-03-15 328',header=TRUE)
### then aggregate
apply.weekly(x.zoo, mean) ## per week
apply.monthly(x.zoo, mean) ## per month

see ??apply.xxxly:

Essentially a wrapper to the xts functions endpoints and period.apply, mainly as a convenience.

Aggregate week and date in R by some specific rules

First we can convert the dates in df2 into year-month-date format, then join the two tables:

library(dplyr);library(lubridate)
df2$dt = ymd(df2$date)
df2$wk = day(df2$dt) %/% 7 + 1
df2$year_month_week = as.numeric(paste0(format(df2$dt, "%Y%m"), df2$wk))

df1 %>%
left_join(df2 %>% group_by(year_month_week) %>% slice(1) %>%
select(year_month_week, temperature))

Result

Joining, by = "year_month_week"
id year_month_week points temperature
1 1 2022051 65 36.1
2 1 2022052 58 36.6
3 1 2022053 47 NA
4 2 2022041 21 34.3
5 2 2022042 25 34.9
6 2 2022043 27 NA
7 2 2022044 43 NA

R: aggregate daily to weekly data by group

You can group by multiple columns.

library(dplyr)

df %>%
group_by(Country, week = lubridate::week(Date)) %>%
summarise(StringencyIndex = mean(StringencyIndex))

Base R aggregate -

aggregate(StringencyIndex~Country + week, transform(df, week = format(Date, '%V')), mean)

Aggregate/Sum data set by week and by product in R

In base R, you can use as.Date to convert your character df$Date into a Date variable and then use format to with the proper formatting to convert the date into a character variable indicating weekly dates. aggregate is then used to perform the aggregation by the new variable.

aggregate(df[2:4], list("weeks"=format(as.Date(df$Date, "%m/%d/%Y"), "%Y-%W")), FUN=sum)
weeks product product2 product3
1 2011-09 1 0 7
2 2011-14 3 8 2
3 2015-09 6 3 89
4 2017-09 7 1 8

See ?strptime for other date conversions.


As @akrun mentions in the comments, the data.table analog to the above base R code is

library(data.table)
setDT(df)[, lapply(.SD, sum),
by=.(weeks = format(as.IDate(Date, "%m/%d/%Y"), "%Y-%W"))]

Here, setDT converts the data.frame into a data.table, lapply... calculates the sum where .SD stands for the data.table. This sum is calculated by each unique element that is produced from format(as.IDate(Date, "%m/%d/%Y"), "%Y-%W") where the conversion uses data.table's as.IDate in place of the base R as.Date.

Group dates by week in R

The code below calculates the current week relative to the minimum week in the data. week2 uses modular arithmetic to make the code more concise, although the week numbers don't always line up exactly with the direct calculation of years and week numbers using lubridate functions.

library(dplyr)
library(lubridate)

df2 %>% mutate(week = (year(Order_Date) - year(min(Order_Date)))*52 +
week(Order_Date) - week(min(Order_Date)),
week2 = (as.numeric(Order_Date) %/% 7) - (as.numeric(min(Order_Date)) %/% 7)) %>%
arrange(Order_Date)
   Order_Date week week2
1 2015-10-23 0 0
2 2015-10-27 0 0
3 2015-11-01 1 1
4 2015-11-07 2 2
5 2015-11-11 2 2
6 2015-11-14 3 3
7 2015-12-17 8 8
8 2016-01-05 10 10
9 2016-01-20 12 12
10 2016-01-30 14 14
11 2016-02-18 16 17
12 2016-03-13 20 20
13 2016-03-31 22 23
14 2016-04-04 23 23
15 2016-04-15 25 25
16 2016-05-08 28 28
17 2016-05-10 28 28
18 2016-07-27 39 39
19 2016-10-01 49 49
20 2016-10-11 50 50

Aggregate dataframe with many columns per week

Without a minimal reproducible example it's difficult to know exactly which function to use to summarize the data (sum, mean, median, etc.).

For now we'll assume that each row represents a day or some more granular unit (since the date column is called Timestamp and we can't see whether there are actual time values in the field).

We use a combination of tidyr, dplyr and lubridate to create a summarized data frame that sums the data in the columns.

First, we generated some raw data that is in a format similar to the data in the screen shot, and read it into R.

rawData <- "Timestamp,Var.2,Amazonas,Antioquia,Arauca
2022-01-01,0,0,0,1
2022-01-02,0,0,1,3
2022-01-03,0,1,1,2
2022-01-04,0,0,1,0
2022-01-05,0,2,0,0
2022-01-06,3,0,2,2
2022-01-07,2,3,0,2
2022-01-08,1,0,0,0
2022-01-09,0,1,3,0
2022-01-10,0,0,0,0
2022-01-11,0,2,0,5
2022-01-12,0,0,3,0
2022-01-13,0,3,0,4
2022-01-14,0,0,4,0
2022-01-15,0,0,0,3
2022-01-16,0,0,0,0
2022-01-17,0,3,0,0
2022-01-18,0,0,2,3
2022-01-19,0,0,0,0
2022-01-20,0,2,0,0
2022-01-21,0,0,5,2
2022-01-22,0,0,0,0
2022-01-23,0,1,0,0
2022-01-24,0,0,3,1
2022-01-25,0,1,0,1
2022-01-26,0,0,0,1
2022-01-27,0,2,0,0
2022-01-28,0,2,0,1
2022-01-29,0,0,1,0
2022-01-30,0,0,1,0"

df <- read.csv(text = rawData,
colClasses = c("Date","numeric","numeric","numeric","numeric"))

Next, we load the required libraries. From lubridate package we'll use the year() and week() functions to group the data by week of the year.

library(lubridate)
library(tidyr)
library(dplyr)

Finally, we use tidyr::pivot_longer() to create long format tidy data where each row represents one day's observations for one column in the wide format data frame, create the Year and Week columns, and summarise() the remaining columns in the data frame.

df %>% pivot_longer(-Timestamp,names_to="Area") %>%
mutate(Year = year(Timestamp),
Week = week(Timestamp)) %>%
group_by(Year,Week,Area) %>%
summarise(summedValue = sum(value)) -> summarisedData

head(summarisedData)

...and the first few rows of output:

> head(summarisedData)
# A tibble: 6 × 4
# Groups: Year, Week [2]
Year Week Area summedValue
<dbl> <dbl> <chr> <dbl>
1 2022 1 Amazonas 6
2 2022 1 Antioquia 5
3 2022 1 Arauca 10
4 2022 1 Var.2 5
5 2022 2 Amazonas 6
6 2022 2 Antioquia 10
>

If we need the data in the original format (wide format tidy data), we can use pivot_wider() to restore the data to its original shape.

# if necessary, pivot_wider() to restore data to original format
summarisedData %>%
pivot_wider(id_cols=c("Year","Week"),
names_from=Area,
values_from=summedValue)

...and the output:

# A tibble: 5 × 6
# Groups: Year, Week [5]
Year Week Amazonas Antioquia Arauca Var.2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2022 1 6 5 10 5
2 2022 2 6 10 9 1
3 2022 3 5 7 8 0
4 2022 4 6 3 4 0
5 2022 5 0 2 0 0
>


Related Topics



Leave a reply



Submit