Aggregating Monthly Column Values into Quarterly Values

R: aggregate rows to transform monthly into quarterly data

You may divide every 3 rows into one quarter and calculate the summarising statistics for each variable. Since col1 and col3 have the same value in each quarter we can select any value from those columns, I have selected the 1st one.

library(dplyr)

survey1 %>%
group_by(Quarter = paste0('Q', ceiling(row_number()/3))) %>%
summarise(across(c(col1, col3), first),
col2 = sum(col2)) %>%
select(Quarter, col1, col2, col3)

# Quarter col1 col2 col3
# <chr> <dbl> <dbl> <dbl>
#1 Q1 10 37 12
#2 Q2 20 98 22

A hack can also be to use col1 and col3 as grouping variable since we know they are the same every quarter. This would work provided every quarter has a different value for col1 and col3 combination.

survey1 %>% 
group_by(col1, col3) %>%
summarise(col2 = sum(col2), .groups = 'drop')

Aggregating monthly column values into quarterly values

1) If DF is the input data.frame convert it to a zoo object z with a "yearmon" index and then aggregate that to "yearqtr":

library(zoo)

toYearmon <- function(y, m) as.yearmon(paste(y, m, sep = "-"))
z <- read.zoo(DF, index = 2:1, FUN = toYearmon)
ag <- aggregate(z, as.yearqtr, sum)

giving:

> ag
2004 Q1 2004 Q2 2004 Q3 2004 Q4
875 820 785 608

2) This would also work:

library(zoo)

yq <- as.yearqtr(as.yearmon(paste(DF$Year, DF$Month), "%Y %m"))
ta <- tapply(DF$DepressionCount, yq, sum)

Convert monthly data to quarterly data using aggregate function

You can extract quarter and year information from Date and aggregate :

df2 <- aggregate(x~quarter + year, transform(df1, 
quarter = quarters(date), year = as.integer(format(date, '%Y'))), mean)
df2 <- df2[order(df2$year), ]

Or using dplyr :

library(dplyr)
df1 %>%
group_by(quarter = paste(quarters(date), lubridate::year(date))) %>%
summarise(x = mean(x))

# quarter x
# <chr> <dbl>
# 1 Q1 2000 0.347
# 2 Q1 2001 -0.592
# 3 Q1 2002 0.802
# 4 Q1 2003 0.237
# 5 Q1 2004 -0.00882
# 6 Q1 2005 0.0535
# 7 Q1 2006 0.218
# 8 Q1 2007 0.177
# 9 Q1 2008 -0.258
#10 Q1 2009 0.246
# … with 72 more rows

convert monthly data into quarter by unique column values in R

Here is a potential approach:

library(dplyr)
library(tidyr)
newdf <- unemp %>%
group_by(DATE = lubridate::floor_date(DATE, "quarter") + months(2),
GeoUID,
Industry,
LFS) %>%
summarise(VALUE = mean(VALUE)) %>%
pivot_wider(names_from = LFS, values_from = VALUE) %>%
mutate("Unemployment rate" = Unemployment / `Labour force`) %>%
pivot_longer(`Labour force`:`Unemployment rate`)
newdf
# A tibble: 6 x 5
# Groups: DATE, GeoUID, Industry [2]
DATE GeoUID Industry name value
<date> <dbl> <chr> <chr> <dbl>
1 2015-03-01 35 Agriculture Labour force 500
2 2015-03-01 35 Agriculture Unemployment 200
3 2015-03-01 35 Agriculture Unemployment rate 0.4
4 2015-03-01 35 Construction Labour force 800
5 2015-03-01 35 Construction Unemployment 300
6 2015-03-01 35 Construction Unemployment rate 0.375

Some remarks

  1. The OP has requested that the quarter should be denoted by an object of class Date. Furthermore, the OP has asked to align the dates of the first quarter with the 1st of March. So, the data are grouped by the first day of each quarter plus 2 months.
  2. GeoUID is also considered as a grouping variable. Otherwise, it must not appear in the final result.
  3. For computing Unemployment rate, the data is being reshaped forth and back. Presumably, there are other, more efficient approaches where the data need not to be reshaped.

EDIT: Version without reshaping

library(dplyr)
df_quarter <- unemp %>%
group_by(DATE = lubridate::floor_date(DATE, "quarter") + months(2),
GeoUID,
Industry,
LFS) %>%
summarise(VALUE = mean(VALUE))
newdf <- bind_rows(
df_quarter,
df_quarter %>%
summarise(VALUE = VALUE[LFS == "Unemployment"] / VALUE[LFS == "Labour force"],
LFS = "Unemployment rate")
) %>%
arrange(DATE, GeoUID, Industry)
newdf
# A tibble: 6 x 5
# Groups: DATE, GeoUID, Industry [2]
DATE GeoUID Industry LFS VALUE
<date> <dbl> <chr> <chr> <dbl>
1 2015-03-01 35 Agriculture Labour force 500
2 2015-03-01 35 Agriculture Unemployment 200
3 2015-03-01 35 Agriculture Unemployment rate 0.4
4 2015-03-01 35 Construction Labour force 800
5 2015-03-01 35 Construction Unemployment 300
6 2015-03-01 35 Construction Unemployment rate 0.375

how to convert monthly data to quarterly in pandas

you can use pd.PeriodIndex(..., freq='Q') in conjunction with groupby(..., axis=1):

In [63]: df
Out[63]:
1996-04 1996-05 2000-07 2000-08 2010-10 2010-11 2010-12
0 1 2 3 4 1 1 1
1 25 19 37 40 1 2 3
2 10 20 30 40 4 4 5

In [64]: df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1).mean()
Out[64]:
1996Q2 2000Q3 2010Q4
0 1.5 3.5 1.000000
1 22.0 38.5 2.000000
2 15.0 35.0 4.333333

UPDATE: to get columns in a resulting DF as strings intead of period dtype:

In [66]: res = (df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1)
.mean()
.rename(columns=lambda c: str(c).lower()))

In [67]: res
Out[67]:
1996q2 2000q3 2010q4
0 1.5 3.5 1.000000
1 22.0 38.5 2.000000
2 15.0 35.0 4.333333

In [68]: res.columns.dtype
Out[68]: dtype('O')

Aggregate function doesn't want to respect months in quarters

You can create a new column with month number of the quarter, then extract the data where method and the month number matches.

library(dplyr)

agg_quarter<-function(data,method){
data %>%
mutate(month = lubridate::month(date3) %% 3,
month = replace(month, month == 0, 3)) %>%
group_by(yq = zoo::as.yearqtr(date3)) %>%
summarise(across(x:z, ~.x[month == method][1]))
}
agg_quarter(df3,1)

# yq x y z
# <yearqtr> <dbl> <dbl> <dbl>
# 1 2000 Q4 NA NA NA
# 2 2001 Q1 -0.340 0.534 0.567
# 3 2001 Q2 1.38 0.865 0.517
# 4 2001 Q3 -0.145 0.238 0.297
# 5 2001 Q4 -1.02 0.208 0.539
# 6 2002 Q1 0.603 0.926 0.0333
# 7 2002 Q2 0.178 0.638 1.92
# 8 2002 Q3 0.106 0.395 0.00905
# 9 2002 Q4 0.862 0.986 0.388
#10 2003 Q1 -0.601 0.805 1.78
# … with 69 more rows


Related Topics



Leave a reply



Submit