R aggregate by week
To group by the ISO definition of weeks, use
require(tidyverse)
df %>%
group_by(year = year(date), week = week(date)) %>%
summarise_if(is.numeric, sum)
To group by weeks starting on Sunday, use @r2evans suggestion
require(tidyverse)
df %>%
group_by(week = format(date, '%Y-%U'))%>%
summarise_if(is.numeric, sum)
How to aggregate a dataframe by week?
Just this once, after some research, I actually think I came up with a better solution that
- gives the correct aggregation
- gives the correct labels
Example below for weeks starting on a thursday. The weeks will be labeled by their first day a given cycle.
library(tidyverse)
library(lubridate)
options(tibble.print_min = 30)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),
day_of_week = wday(time, label = FALSE))
df2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),
tmp_1 = (as.integer(day_of_week) - 5),
tmp_2 = ((as.integer(day_of_week) - 5) %% 7))
which gives
> df2
# A tibble: 25 × 7
time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2
<date> <dbl> <ord> <dbl> <date> <dbl> <dbl>
1 2014-02-24 30 Mon 2 2014-02-20 -3 4
2 2014-02-25 45 Tues 3 2014-02-20 -2 5
3 2014-02-26 30 Wed 4 2014-02-20 -1 6
4 2014-02-27 50 Thurs 5 2014-02-27 0 0
5 2014-02-28 50 Fri 6 2014-02-27 1 1
6 2014-03-01 20 Sat 7 2014-02-27 2 2
7 2014-03-02 35 Sun 1 2014-02-27 -4 3
8 2014-03-03 50 Mon 2 2014-02-27 -3 4
9 2014-03-04 35 Tues 3 2014-02-27 -2 5
10 2014-03-05 35 Wed 4 2014-02-27 -1 6
11 2014-03-06 50 Thurs 5 2014-03-06 0 0
12 2014-03-07 35 Fri 6 2014-03-06 1 1
13 2014-03-08 40 Sat 7 2014-03-06 2 2
14 2014-03-09 40 Sun 1 2014-03-06 -4 3
15 2014-03-10 20 Mon 2 2014-03-06 -3 4
16 2014-03-11 50 Tues 3 2014-03-06 -2 5
17 2014-03-12 25 Wed 4 2014-03-06 -1 6
18 2014-03-13 20 Thurs 5 2014-03-13 0 0
19 2014-03-14 30 Fri 6 2014-03-13 1 1
20 2014-03-15 50 Sat 7 2014-03-13 2 2
21 2014-03-16 50 Sun 1 2014-03-13 -4 3
22 2014-03-17 40 Mon 2 2014-03-13 -3 4
23 2014-03-18 40 Tues 3 2014-03-13 -2 5
24 2014-03-19 50 Wed 4 2014-03-13 -1 6
25 2014-03-20 40 Thurs 5 2014-03-20 0 0
Aggregate data by week, month etc in R
The answer mentioned suggest that you should to use xts
package.
library(xts)
## create you zoo objects using your data
## you replace text argument by read.zoo(yourfile, header = TRUE)
x.zoo <- read.zoo(text=' Data Kg
+ 1 2013-03-01 271
+ 2 2013-03-06 374
+ 3 2013-03-07 51
+ 4 2013-03-12 210
+ 5 2013-03-13 698
+ 6 2013-03-15 328',header=TRUE)
### then aggregate
apply.weekly(x.zoo, mean) ## per week
apply.monthly(x.zoo, mean) ## per month
see ??apply.xxxly
:
Essentially a wrapper to the xts functions endpoints and period.apply, mainly as a convenience.
Aggregate week and date in R by some specific rules
First we can convert the dates in df2
into year-month-date format, then join the two tables:
library(dplyr);library(lubridate)
df2$dt = ymd(df2$date)
df2$wk = day(df2$dt) %/% 7 + 1
df2$year_month_week = as.numeric(paste0(format(df2$dt, "%Y%m"), df2$wk))
df1 %>%
left_join(df2 %>% group_by(year_month_week) %>% slice(1) %>%
select(year_month_week, temperature))
Result
Joining, by = "year_month_week"
id year_month_week points temperature
1 1 2022051 65 36.1
2 1 2022052 58 36.6
3 1 2022053 47 NA
4 2 2022041 21 34.3
5 2 2022042 25 34.9
6 2 2022043 27 NA
7 2 2022044 43 NA
R: aggregate daily to weekly data by group
You can group by multiple columns.
library(dplyr)
df %>%
group_by(Country, week = lubridate::week(Date)) %>%
summarise(StringencyIndex = mean(StringencyIndex))
Base R aggregate
-
aggregate(StringencyIndex~Country + week, transform(df, week = format(Date, '%V')), mean)
Aggregate/Sum data set by week and by product in R
In base R, you can use as.Date
to convert your character df$Date into a Date variable and then use format
to with the proper formatting to convert the date into a character variable indicating weekly dates. aggregate
is then used to perform the aggregation by the new variable.
aggregate(df[2:4], list("weeks"=format(as.Date(df$Date, "%m/%d/%Y"), "%Y-%W")), FUN=sum)
weeks product product2 product3
1 2011-09 1 0 7
2 2011-14 3 8 2
3 2015-09 6 3 89
4 2017-09 7 1 8
See ?strptime
for other date conversions.
As @akrun mentions in the comments, the data.table
analog to the above base R code is
library(data.table)
setDT(df)[, lapply(.SD, sum),
by=.(weeks = format(as.IDate(Date, "%m/%d/%Y"), "%Y-%W"))]
Here, setDT
converts the data.frame into a data.table, lapply...
calculates the sum where .SD
stands for the data.table. This sum is calculated by each unique element that is produced from format(as.IDate(Date, "%m/%d/%Y"), "%Y-%W")
where the conversion uses data.table's as.IDate
in place of the base R as.Date
.
Group dates by week in R
The code below calculates the current week relative to the minimum week in the data. week2
uses modular arithmetic to make the code more concise, although the week numbers don't always line up exactly with the direct calculation of years and week numbers using lubridate
functions.
library(dplyr)
library(lubridate)
df2 %>% mutate(week = (year(Order_Date) - year(min(Order_Date)))*52 +
week(Order_Date) - week(min(Order_Date)),
week2 = (as.numeric(Order_Date) %/% 7) - (as.numeric(min(Order_Date)) %/% 7)) %>%
arrange(Order_Date)
Order_Date week week2
1 2015-10-23 0 0
2 2015-10-27 0 0
3 2015-11-01 1 1
4 2015-11-07 2 2
5 2015-11-11 2 2
6 2015-11-14 3 3
7 2015-12-17 8 8
8 2016-01-05 10 10
9 2016-01-20 12 12
10 2016-01-30 14 14
11 2016-02-18 16 17
12 2016-03-13 20 20
13 2016-03-31 22 23
14 2016-04-04 23 23
15 2016-04-15 25 25
16 2016-05-08 28 28
17 2016-05-10 28 28
18 2016-07-27 39 39
19 2016-10-01 49 49
20 2016-10-11 50 50
Aggregate dataframe with many columns per week
Without a minimal reproducible example it's difficult to know exactly which function to use to summarize the data (sum, mean, median, etc.).
For now we'll assume that each row represents a day or some more granular unit (since the date column is called Timestamp
and we can't see whether there are actual time values in the field).
We use a combination of tidyr
, dplyr
and lubridate
to create a summarized data frame that sums the data in the columns.
First, we generated some raw data that is in a format similar to the data in the screen shot, and read it into R.
rawData <- "Timestamp,Var.2,Amazonas,Antioquia,Arauca
2022-01-01,0,0,0,1
2022-01-02,0,0,1,3
2022-01-03,0,1,1,2
2022-01-04,0,0,1,0
2022-01-05,0,2,0,0
2022-01-06,3,0,2,2
2022-01-07,2,3,0,2
2022-01-08,1,0,0,0
2022-01-09,0,1,3,0
2022-01-10,0,0,0,0
2022-01-11,0,2,0,5
2022-01-12,0,0,3,0
2022-01-13,0,3,0,4
2022-01-14,0,0,4,0
2022-01-15,0,0,0,3
2022-01-16,0,0,0,0
2022-01-17,0,3,0,0
2022-01-18,0,0,2,3
2022-01-19,0,0,0,0
2022-01-20,0,2,0,0
2022-01-21,0,0,5,2
2022-01-22,0,0,0,0
2022-01-23,0,1,0,0
2022-01-24,0,0,3,1
2022-01-25,0,1,0,1
2022-01-26,0,0,0,1
2022-01-27,0,2,0,0
2022-01-28,0,2,0,1
2022-01-29,0,0,1,0
2022-01-30,0,0,1,0"
df <- read.csv(text = rawData,
colClasses = c("Date","numeric","numeric","numeric","numeric"))
Next, we load the required libraries. From lubridate
package we'll use the year()
and week()
functions to group the data by week of the year.
library(lubridate)
library(tidyr)
library(dplyr)
Finally, we use tidyr::pivot_longer()
to create long format tidy data where each row represents one day's observations for one column in the wide format data frame, create the Year
and Week
columns, and summarise()
the remaining columns in the data frame.
df %>% pivot_longer(-Timestamp,names_to="Area") %>%
mutate(Year = year(Timestamp),
Week = week(Timestamp)) %>%
group_by(Year,Week,Area) %>%
summarise(summedValue = sum(value)) -> summarisedData
head(summarisedData)
...and the first few rows of output:
> head(summarisedData)
# A tibble: 6 × 4
# Groups: Year, Week [2]
Year Week Area summedValue
<dbl> <dbl> <chr> <dbl>
1 2022 1 Amazonas 6
2 2022 1 Antioquia 5
3 2022 1 Arauca 10
4 2022 1 Var.2 5
5 2022 2 Amazonas 6
6 2022 2 Antioquia 10
>
If we need the data in the original format (wide format tidy data), we can use pivot_wider()
to restore the data to its original shape.
# if necessary, pivot_wider() to restore data to original format
summarisedData %>%
pivot_wider(id_cols=c("Year","Week"),
names_from=Area,
values_from=summedValue)
...and the output:
# A tibble: 5 × 6
# Groups: Year, Week [5]
Year Week Amazonas Antioquia Arauca Var.2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2022 1 6 5 10 5
2 2022 2 6 10 9 1
3 2022 3 5 7 8 0
4 2022 4 6 3 4 0
5 2022 5 0 2 0 0
>
Related Topics
Ggplot2: Splitting Facet/Strip Text into Two Lines
Double Clustered Standard Errors for Panel Data
How to Find Common Rows Between Two Dataframe in R
Remove Unused Factor Levels from a Ggplot Bar Plot
Ggplot2 Draw Individual Ellipses But Color by Group
Programmatically Insert Header and Plot in Same Code Chunk with R Markdown Using Results='Asis'
Reshape Multi Id Repeated Variable Readings from Long to Wide
Combining Low Frequency Counts
Dplyr String as Column Reference
As.Numeric() Removes Decimal Places in R, How to Change
Two Horizontal Bar Charts with Shared Axis in Ggplot2 (Similar to Population Pyramid)
Different Colour Palettes for Two Different Colour Aesthetic Mappings in Ggplot2
Dplyr Group by Colnames Described as Vector of Strings
How to Assign from a Function with Multiple Outputs
Si Prefixes in Ggplot2 Axis Labels
New R-Studio Version 0.98.932 Deletes .Md File - How to Prevent
Best Way to Replace a Lengthy Ifelse Structure in R
Gsub in R with Unicode Replacement Give Different Results Under Windows Compared with Unix