How to Group My Date Variable into Month/Year in R

How do I group my date variable into month/year in R?

Here is an example using dplyr. You simply use the corresponding date format string for month %m or year %Y in the format statement.

set.seed(123)
df <- data.frame(date = seq.Date(from =as.Date("01/01/1998", "%d/%m/%Y"),
to=as.Date("01/01/2000", "%d/%m/%Y"), by="day"),
value = sample(seq(5), 731, replace = TRUE))

head(df)
date value
1 1998-01-01 2
2 1998-01-02 4
3 1998-01-03 3
4 1998-01-04 5
5 1998-01-05 5
6 1998-01-06 1

library(dplyr)

df %>%
mutate(month = format(date, "%m"), year = format(date, "%Y")) %>%
group_by(month, year) %>%
summarise(total = sum(value))

Source: local data frame [25 x 3]
Groups: month [?]

month year total
(chr) (chr) (int)
1 01 1998 105
2 01 1999 91
3 01 2000 3
4 02 1998 74
5 02 1999 77
6 03 1998 96
7 03 1999 86
8 04 1998 91
9 04 1999 95
10 05 1998 93
.. ... ... ...

How to group data by month in R with dplyr

By using lubridate::date and simple dplyr

library(dplyr)

dummy <- data.frame(
orderdate = seq(as.Date("2020-01-02"),as.Date("2021-07-13"), by = "days")
)

dummy %>%
as_tibble %>%
mutate(month = lubridate::month(orderdate)) %>%
group_by(month) %>%
summarise(n = n())

month n
<dbl> <int>
1 1 61
2 2 57
3 3 62
4 4 60
5 5 62
6 6 60
7 7 44
8 8 31
9 9 30
10 10 31
11 11 30
12 12 31

as table

dummy2 <- dummy %>%
as_tibble %>%
mutate(month = lubridate::month(orderdate)) %>%
group_by(month) %>%
summarise(n = n()) %>%
select(n) %>%
t %>%
as.table
colnames(dummy2) <- seq(1:12)
dummy2

1 2 3 4 5 6 7 8 9 10 11 12
n 61 57 62 60 62 60 44 31 30 31 30 31

Grouping daily data by month with means

You stored the date column as a factor. You could either read this column as a date type or convert it to date format in R.

For the sample data:

location <- c('Afghanistan', 'Colombia', ' Democratic Republic of Congo', 'India', 'Iraq', 'Lebanon', 'Lebanon')
date <- factor(c('24/02/2020', '25/02/2020', '26/02/2020', '27/02/2020', '28/02/2020', '26/02/2020', '27/02/2020'))
total_cases_per_million <- c(0.026, 0.026, 0.026, 0.026, 0.026, 0.026, 0.052)
stringency_index <- c(8.33, 8.33, 8.33, 8.33, 8.33, 8.33, 10.00)

datacovid <- data.frame(location, date, total_cases_per_million, stringency_index)

You can get the monthly averages for total_cases_per_million and stringency_index for each country, first by converting the date column to a date format and then you can use dplyr's group_by function.

datacovid$date = as.Date(datacovid$date, format = "%d/%m/%Y")

library(dplyr)

datacovid %>%
mutate(month = format(date, "%m")) %>%
group_by(location, month) %>%
summarise(avg_total_cases_per_million=mean(total_cases_per_million), avg_stringency_index=mean(stringency_index))

This yields the output:

output

or you can use the lubridate package to extract the month from the date which does this neatly:

library(lubridate)

datacovid %>%
mutate(month = month(date)) %>%
group_by(location, month) %>%
summarise(avg_total_cases_per_million=mean(total_cases_per_million), avg_stringency_index=mean(stringency_index))

How to change year.month format into Year-Month format in R

You can use sub, with capturing groups in the regular expression:

df$Month <- sub("^(\\d{4})\\.(\\d{2})$", "\\1-\\2", format(df$Month, 2))

df
#> Month GSI
#> 1 1993-01 -0.5756706
#> 2 1993-02 -1.1554924
#> 3 1993-03 -1.0035307
#> 4 1993-04 -0.1069888
#> 5 1993-05 -0.3190359
#> 6 1993-06 0.3036164
#> 7 1993-07 1.2452892
#> 8 1993-08 0.8510437
#> 9 1993-09 1.2468009
#> 10 1993-10 1.4252141

Input Data

df <- structure(list(Month = c(1993.01, 1993.02, 1993.03, 1993.04, 
1993.05, 1993.06, 1993.07, 1993.08, 1993.09, 1993.1), GSI = c(-0.57567056,
-1.15549239, -1.00353071, -0.1069888, -0.31903591, 0.30361638,
1.24528915, 0.8510437, 1.24680092, 1.42521406)), class = "data.frame", row.names = c(NA,
-10L))

df
#> Month GSI
#> 1 1993.01 -0.5756706
#> 2 1993.02 -1.1554924
#> 3 1993.03 -1.0035307
#> 4 1993.04 -0.1069888
#> 5 1993.05 -0.3190359
#> 6 1993.06 0.3036164
#> 7 1993.07 1.2452892
#> 8 1993.08 0.8510437
#> 9 1993.09 1.2468009
#> 10 1993.10 1.4252141

Group by weekly data and summarize by month in R with dplyr

We can get the month extracted as column and do a group by mean

library(dplyr)
library(lubridate)
library(zoo)
df1 %>%
group_by(Month = as.Date(as.yearmon(mdy(DATE)), 1)) %>%
summarise(Average_rate = mean(MORTGAGE30US))

-output

# A tibble: 151 x 2
# Month Average_rate
# <date> <dbl>
# 1 2008-02-29 5.92
# 2 2008-03-31 5.97
# 3 2008-04-30 5.92
# 4 2008-05-31 6.04
# 5 2008-06-30 6.32
# 6 2008-07-31 6.43
# 7 2008-08-31 6.48
# 8 2008-09-30 6.04
# 9 2008-10-31 6.2
#10 2008-11-30 6.09
# … with 141 more rows

How to filter by dates and grouping months together in R using dplyr

I managed to do it using all dplyr functions, with help from @user108636

df %>%
select(Date, Price) %>%
arrange(Date) %>%
mutate(Month_Year = substr(Date, 1,7)) %>%
group_by(Month_Year) %>%
summarise(mean(Price, na.rm = TRUE))

The select function selects the date and price columns.
The arrange function arranges my dataframe according to the date - with the earliest date first. The mutate function adds another column which excludes the day and leaves us with, for example...

Month_Year
2015-10
2015-10
2015-11
2015-12
2015-12

The group by function groups all the months together and the summarise function calculates the mean of the price of each month.

How to group dates into years, when the year starts on a month other than January

One way of solving this problem is defining a sequence of desired breaks and the associated labels. Such as this:

date<-as.Date(c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05"))

#define break points
cutpoints<-seq.Date(as.Date("1999-12-01"), by="1 year", length.out = 23)
#define labels (1 less than the number of breaks)
names<-seq.Date(as.Date("2000-01-01"), by="1 year", length.out = 22)


cut(date, breaks=cutpoints, labels = names)

Create end of the month date from a date variable

To get the end of months you could just create a Date vector containing the 1st of all the subsequent months and subtract 1 day.

date.end.month <- seq(as.Date("2012-02-01"),length=4,by="months")-1
date.end.month
[1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"

Format Date to Year-Month in R

lubridate only handle dates, and dates have days. However, as alistaire mentions, you can floor them by month of you want work monthly:

library(tidyverse)

df_month <-
df %>%
mutate(Date = floor_date(as_date(Date), "month"))

If you e.g. want to aggregate by month, just group_by() and summarize().

df_month %>%
group_by(Date) %>%
summarize(N = sum(N)) %>%
ungroup()

#> # A tibble: 4 x 2
#> Date N
#> <date> <dbl>
#>1 2017-01-01 59
#>2 2018-01-01 20
#>3 2018-02-01 33
#>4 2018-03-01 45


Related Topics



Leave a reply



Submit