How do I group my date variable into month/year in R?
Here is an example using dplyr
. You simply use the corresponding date format string for month %m
or year %Y
in the format
statement.
set.seed(123)
df <- data.frame(date = seq.Date(from =as.Date("01/01/1998", "%d/%m/%Y"),
to=as.Date("01/01/2000", "%d/%m/%Y"), by="day"),
value = sample(seq(5), 731, replace = TRUE))
head(df)
date value
1 1998-01-01 2
2 1998-01-02 4
3 1998-01-03 3
4 1998-01-04 5
5 1998-01-05 5
6 1998-01-06 1
library(dplyr)
df %>%
mutate(month = format(date, "%m"), year = format(date, "%Y")) %>%
group_by(month, year) %>%
summarise(total = sum(value))
Source: local data frame [25 x 3]
Groups: month [?]
month year total
(chr) (chr) (int)
1 01 1998 105
2 01 1999 91
3 01 2000 3
4 02 1998 74
5 02 1999 77
6 03 1998 96
7 03 1999 86
8 04 1998 91
9 04 1999 95
10 05 1998 93
.. ... ... ...
How to group data by month in R with dplyr
By using lubridate::date
and simple dplyr
library(dplyr)
dummy <- data.frame(
orderdate = seq(as.Date("2020-01-02"),as.Date("2021-07-13"), by = "days")
)
dummy %>%
as_tibble %>%
mutate(month = lubridate::month(orderdate)) %>%
group_by(month) %>%
summarise(n = n())
month n
<dbl> <int>
1 1 61
2 2 57
3 3 62
4 4 60
5 5 62
6 6 60
7 7 44
8 8 31
9 9 30
10 10 31
11 11 30
12 12 31
as table
dummy2 <- dummy %>%
as_tibble %>%
mutate(month = lubridate::month(orderdate)) %>%
group_by(month) %>%
summarise(n = n()) %>%
select(n) %>%
t %>%
as.table
colnames(dummy2) <- seq(1:12)
dummy2
1 2 3 4 5 6 7 8 9 10 11 12
n 61 57 62 60 62 60 44 31 30 31 30 31
Grouping daily data by month with means
You stored the date column as a factor. You could either read this column as a date type or convert it to date format in R.
For the sample data:
location <- c('Afghanistan', 'Colombia', ' Democratic Republic of Congo', 'India', 'Iraq', 'Lebanon', 'Lebanon')
date <- factor(c('24/02/2020', '25/02/2020', '26/02/2020', '27/02/2020', '28/02/2020', '26/02/2020', '27/02/2020'))
total_cases_per_million <- c(0.026, 0.026, 0.026, 0.026, 0.026, 0.026, 0.052)
stringency_index <- c(8.33, 8.33, 8.33, 8.33, 8.33, 8.33, 10.00)
datacovid <- data.frame(location, date, total_cases_per_million, stringency_index)
You can get the monthly averages for total_cases_per_million and stringency_index for each country, first by converting the date column to a date
format and then you can use dplyr
's group_by
function.
datacovid$date = as.Date(datacovid$date, format = "%d/%m/%Y")
library(dplyr)
datacovid %>%
mutate(month = format(date, "%m")) %>%
group_by(location, month) %>%
summarise(avg_total_cases_per_million=mean(total_cases_per_million), avg_stringency_index=mean(stringency_index))
This yields the output:
or you can use the lubridate package to extract the month from the date which does this neatly:
library(lubridate)
datacovid %>%
mutate(month = month(date)) %>%
group_by(location, month) %>%
summarise(avg_total_cases_per_million=mean(total_cases_per_million), avg_stringency_index=mean(stringency_index))
How to change year.month format into Year-Month format in R
You can use sub
, with capturing groups in the regular expression:
df$Month <- sub("^(\\d{4})\\.(\\d{2})$", "\\1-\\2", format(df$Month, 2))
df
#> Month GSI
#> 1 1993-01 -0.5756706
#> 2 1993-02 -1.1554924
#> 3 1993-03 -1.0035307
#> 4 1993-04 -0.1069888
#> 5 1993-05 -0.3190359
#> 6 1993-06 0.3036164
#> 7 1993-07 1.2452892
#> 8 1993-08 0.8510437
#> 9 1993-09 1.2468009
#> 10 1993-10 1.4252141
Input Data
df <- structure(list(Month = c(1993.01, 1993.02, 1993.03, 1993.04,
1993.05, 1993.06, 1993.07, 1993.08, 1993.09, 1993.1), GSI = c(-0.57567056,
-1.15549239, -1.00353071, -0.1069888, -0.31903591, 0.30361638,
1.24528915, 0.8510437, 1.24680092, 1.42521406)), class = "data.frame", row.names = c(NA,
-10L))
df
#> Month GSI
#> 1 1993.01 -0.5756706
#> 2 1993.02 -1.1554924
#> 3 1993.03 -1.0035307
#> 4 1993.04 -0.1069888
#> 5 1993.05 -0.3190359
#> 6 1993.06 0.3036164
#> 7 1993.07 1.2452892
#> 8 1993.08 0.8510437
#> 9 1993.09 1.2468009
#> 10 1993.10 1.4252141
Group by weekly data and summarize by month in R with dplyr
We can get the month extracted as column and do a group by mean
library(dplyr)
library(lubridate)
library(zoo)
df1 %>%
group_by(Month = as.Date(as.yearmon(mdy(DATE)), 1)) %>%
summarise(Average_rate = mean(MORTGAGE30US))
-output
# A tibble: 151 x 2
# Month Average_rate
# <date> <dbl>
# 1 2008-02-29 5.92
# 2 2008-03-31 5.97
# 3 2008-04-30 5.92
# 4 2008-05-31 6.04
# 5 2008-06-30 6.32
# 6 2008-07-31 6.43
# 7 2008-08-31 6.48
# 8 2008-09-30 6.04
# 9 2008-10-31 6.2
#10 2008-11-30 6.09
# … with 141 more rows
How to filter by dates and grouping months together in R using dplyr
I managed to do it using all dplyr functions, with help from @user108636
df %>%
select(Date, Price) %>%
arrange(Date) %>%
mutate(Month_Year = substr(Date, 1,7)) %>%
group_by(Month_Year) %>%
summarise(mean(Price, na.rm = TRUE))
The select function selects the date and price columns.
The arrange function arranges my dataframe according to the date - with the earliest date first. The mutate function adds another column which excludes the day and leaves us with, for example...
Month_Year
2015-10
2015-10
2015-11
2015-12
2015-12
The group by function groups all the months together and the summarise function calculates the mean of the price of each month.
How to group dates into years, when the year starts on a month other than January
One way of solving this problem is defining a sequence of desired breaks and the associated labels. Such as this:
date<-as.Date(c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05"))
#define break points
cutpoints<-seq.Date(as.Date("1999-12-01"), by="1 year", length.out = 23)
#define labels (1 less than the number of breaks)
names<-seq.Date(as.Date("2000-01-01"), by="1 year", length.out = 22)
cut(date, breaks=cutpoints, labels = names)
Create end of the month date from a date variable
To get the end of months you could just create a Date
vector containing the 1st of all the subsequent months and subtract 1 day.
date.end.month <- seq(as.Date("2012-02-01"),length=4,by="months")-1
date.end.month
[1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
Format Date to Year-Month in R
lubridate
only handle dates, and dates have days. However, as alistaire mentions, you can floor them by month of you want work monthly:
library(tidyverse)
df_month <-
df %>%
mutate(Date = floor_date(as_date(Date), "month"))
If you e.g. want to aggregate by month, just group_by()
and summarize()
.
df_month %>%
group_by(Date) %>%
summarize(N = sum(N)) %>%
ungroup()
#> # A tibble: 4 x 2
#> Date N
#> <date> <dbl>
#>1 2017-01-01 59
#>2 2018-01-01 20
#>3 2018-02-01 33
#>4 2018-03-01 45
Related Topics
R - Converting Date and Time Fields to Posixct with Hhmmss Format
Add a Row by Reference at the End of a Data.Table Object
How to Define Fixed Aspect-Ratio for (Base R) Scatter-Plot
If - Else If - Else Statement and Brackets
How to Position Strip Labels in Facet_Wrap Like in Facet_Grid
How to Move or Position a Legend in Ggplot2
Rscript Does Not Load Methods Package, R Does -- Why, and What Are the Consequences
Subfigures or Subcaptions with Knitr
Get_Map Not Passing the API Key (Http Status Was '403 Forbidden')
How to Change Order of Boxplots When Using Ggplot2
How to Parse Year + Week Number in R
Exact Number of Bins in Histogram in R
Create Zip File: Error Running Command " " Had Status 127
Find the N Most Common Values in a Vector
How to Plot a Stacked and Grouped Bar Chart in Ggplot