R: aggregate rows to transform monthly into quarterly data
You may divide every 3 rows into one quarter and calculate the summarising statistics for each variable. Since col1
and col3
have the same value in each quarter we can select any value from those columns, I have selected the 1st one.
library(dplyr)
survey1 %>%
group_by(Quarter = paste0('Q', ceiling(row_number()/3))) %>%
summarise(across(c(col1, col3), first),
col2 = sum(col2)) %>%
select(Quarter, col1, col2, col3)
# Quarter col1 col2 col3
# <chr> <dbl> <dbl> <dbl>
#1 Q1 10 37 12
#2 Q2 20 98 22
A hack can also be to use col1
and col3
as grouping variable since we know they are the same every quarter. This would work provided every quarter has a different value for col1
and col3
combination.
survey1 %>%
group_by(col1, col3) %>%
summarise(col2 = sum(col2), .groups = 'drop')
Aggregating monthly column values into quarterly values
1) If DF
is the input data.frame convert it to a zoo object z
with a "yearmon"
index and then aggregate that to "yearqtr"
:
library(zoo)
toYearmon <- function(y, m) as.yearmon(paste(y, m, sep = "-"))
z <- read.zoo(DF, index = 2:1, FUN = toYearmon)
ag <- aggregate(z, as.yearqtr, sum)
giving:
> ag
2004 Q1 2004 Q2 2004 Q3 2004 Q4
875 820 785 608
2) This would also work:
library(zoo)
yq <- as.yearqtr(as.yearmon(paste(DF$Year, DF$Month), "%Y %m"))
ta <- tapply(DF$DepressionCount, yq, sum)
Convert monthly data to quarterly data using aggregate function
You can extract quarter and year information from Date
and aggregate
:
df2 <- aggregate(x~quarter + year, transform(df1,
quarter = quarters(date), year = as.integer(format(date, '%Y'))), mean)
df2 <- df2[order(df2$year), ]
Or using dplyr
:
library(dplyr)
df1 %>%
group_by(quarter = paste(quarters(date), lubridate::year(date))) %>%
summarise(x = mean(x))
# quarter x
# <chr> <dbl>
# 1 Q1 2000 0.347
# 2 Q1 2001 -0.592
# 3 Q1 2002 0.802
# 4 Q1 2003 0.237
# 5 Q1 2004 -0.00882
# 6 Q1 2005 0.0535
# 7 Q1 2006 0.218
# 8 Q1 2007 0.177
# 9 Q1 2008 -0.258
#10 Q1 2009 0.246
# … with 72 more rows
convert monthly data into quarter by unique column values in R
Here is a potential approach:
library(dplyr)
library(tidyr)
newdf <- unemp %>%
group_by(DATE = lubridate::floor_date(DATE, "quarter") + months(2),
GeoUID,
Industry,
LFS) %>%
summarise(VALUE = mean(VALUE)) %>%
pivot_wider(names_from = LFS, values_from = VALUE) %>%
mutate("Unemployment rate" = Unemployment / `Labour force`) %>%
pivot_longer(`Labour force`:`Unemployment rate`)
newdf
# A tibble: 6 x 5
# Groups: DATE, GeoUID, Industry [2]
DATE GeoUID Industry name value
<date> <dbl> <chr> <chr> <dbl>
1 2015-03-01 35 Agriculture Labour force 500
2 2015-03-01 35 Agriculture Unemployment 200
3 2015-03-01 35 Agriculture Unemployment rate 0.4
4 2015-03-01 35 Construction Labour force 800
5 2015-03-01 35 Construction Unemployment 300
6 2015-03-01 35 Construction Unemployment rate 0.375
Some remarks
- The OP has requested that the quarter should be denoted by an object of class
Date
. Furthermore, the OP has asked to align the dates of the first quarter with the 1st of March. So, the data are grouped by the first day of each quarter plus 2 months. GeoUID
is also considered as a grouping variable. Otherwise, it must not appear in the final result.- For computing
Unemployment rate
, the data is being reshaped forth and back. Presumably, there are other, more efficient approaches where the data need not to be reshaped.
EDIT: Version without reshaping
library(dplyr)
df_quarter <- unemp %>%
group_by(DATE = lubridate::floor_date(DATE, "quarter") + months(2),
GeoUID,
Industry,
LFS) %>%
summarise(VALUE = mean(VALUE))
newdf <- bind_rows(
df_quarter,
df_quarter %>%
summarise(VALUE = VALUE[LFS == "Unemployment"] / VALUE[LFS == "Labour force"],
LFS = "Unemployment rate")
) %>%
arrange(DATE, GeoUID, Industry)
newdf
# A tibble: 6 x 5
# Groups: DATE, GeoUID, Industry [2]
DATE GeoUID Industry LFS VALUE
<date> <dbl> <chr> <chr> <dbl>
1 2015-03-01 35 Agriculture Labour force 500
2 2015-03-01 35 Agriculture Unemployment 200
3 2015-03-01 35 Agriculture Unemployment rate 0.4
4 2015-03-01 35 Construction Labour force 800
5 2015-03-01 35 Construction Unemployment 300
6 2015-03-01 35 Construction Unemployment rate 0.375
how to convert monthly data to quarterly in pandas
you can use pd.PeriodIndex(..., freq='Q') in conjunction with groupby(..., axis=1):
In [63]: df
Out[63]:
1996-04 1996-05 2000-07 2000-08 2010-10 2010-11 2010-12
0 1 2 3 4 1 1 1
1 25 19 37 40 1 2 3
2 10 20 30 40 4 4 5
In [64]: df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1).mean()
Out[64]:
1996Q2 2000Q3 2010Q4
0 1.5 3.5 1.000000
1 22.0 38.5 2.000000
2 15.0 35.0 4.333333
UPDATE: to get columns in a resulting DF as strings intead of period
dtype:
In [66]: res = (df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1)
.mean()
.rename(columns=lambda c: str(c).lower()))
In [67]: res
Out[67]:
1996q2 2000q3 2010q4
0 1.5 3.5 1.000000
1 22.0 38.5 2.000000
2 15.0 35.0 4.333333
In [68]: res.columns.dtype
Out[68]: dtype('O')
Aggregate function doesn't want to respect months in quarters
You can create a new column with month number of the quarter, then extract the data where method
and the month number matches.
library(dplyr)
agg_quarter<-function(data,method){
data %>%
mutate(month = lubridate::month(date3) %% 3,
month = replace(month, month == 0, 3)) %>%
group_by(yq = zoo::as.yearqtr(date3)) %>%
summarise(across(x:z, ~.x[month == method][1]))
}
agg_quarter(df3,1)
# yq x y z
# <yearqtr> <dbl> <dbl> <dbl>
# 1 2000 Q4 NA NA NA
# 2 2001 Q1 -0.340 0.534 0.567
# 3 2001 Q2 1.38 0.865 0.517
# 4 2001 Q3 -0.145 0.238 0.297
# 5 2001 Q4 -1.02 0.208 0.539
# 6 2002 Q1 0.603 0.926 0.0333
# 7 2002 Q2 0.178 0.638 1.92
# 8 2002 Q3 0.106 0.395 0.00905
# 9 2002 Q4 0.862 0.986 0.388
#10 2003 Q1 -0.601 0.805 1.78
# … with 69 more rows
Related Topics
Error in As.Double(Y):Cannot Coerce Type 'S4' to Vector of Type 'Double'
Copy-On-Modify Semantic on a Vector Does Not Append in a Loop. Why
Parallel R on a Windows Cluster
How to 'Subset' a Named Vector in R
Extract Only Quarter from a Date in R
Replace Column Values with Column Name Using Dplyr's Transmute_All
Recode Multiple Columns Using Dplyr
Replace Na with Grouped Means in R
Importing Multiple .CSV Files with Variable Column Types into R
How to Unlock Environment in R
Convert from N X M Matrix to Long Matrix in R
R Shiny - Ui.R Seems to Not Recognize a Dataframe Read by Server.R
How to Find Correct Executable with Sys.Which on Windows
Filter a Column Which Contains Several Keywords
R: How to Create Grid-Graphics