Grouped Moving Average in R

Rolling mean (moving average) by group/id with dplyr

If you are not committed to to dplyr this should work:

get.mav <- function(bp,n=2){
require(zoo)
if(is.na(bp[1])) bp[1] <- mean(bp,na.rm=TRUE)
bp <- na.locf(bp,na.rm=FALSE)
if(length(bp)<n) return(bp)
c(bp[1:(n-1)],rollapply(bp,width=n,mean,align="right"))
}
test <- with(test,test[order(ID,YEAR_VISIT),])

test$BLOOD_PRESSURE_UPDATED <-
unlist(aggregate(BLOOD_PRESSURE~ID,test,get.mav,na.action=NULL,n=2)$BLOOD_PRESSURE)
test
# ID AGE YEAR_VISIT BLOOD_PRESSURE TREATMENT BLOOD_PRESSURE_UPDATED
# 1 1 20 2000 NA 3 134.6667
# 2 1 21 2001 129 2 131.8333
# 3 1 22 2002 145 3 137.0000
# 4 1 22 2002 130 2 137.5000
# 5 2 23 2003 NA NA 130.0000
# 6 2 30 2010 150 2 140.0000
# 7 2 31 2011 110 3 130.0000
# ...

This works for moving averages > 2 as well.

And here's a data.table solution, which is likely to be much faster if your dataset is large.

library(data.table)
setDT(test) # converts test to a data.table in place
setkey(test,ID,YEAR_VISIT)
test[,BLOOD_PRESSURE_UPDATED:=as.numeric(get.mav(BLOOD_PRESSURE,2)),by=ID]
test
# ID AGE YEAR_VISIT BLOOD_PRESSURE TREATMENT BLOOD_PRESSURE_UPDATED
# 1: 1 20 2000 NA 3 134.6667
# 2: 1 21 2001 129 2 131.8333
# 3: 1 22 2002 145 3 137.0000
# 4: 1 22 2002 130 2 137.5000
# 5: 2 23 2003 NA NA 130.0000
# 6: 2 30 2010 150 2 140.0000
# 7: 2 31 2011 110 3 130.0000
# ...

7 Day Moving Average per group - R

You didn't supply enough data to create a weekly rolling mean within the groups, but in principle it could work like this:

library(tidyverse)
library(zoo)

my_data <- my_data %>%
group_by(ID) %>%
mutate(roll_mean = rollmean(Count, 2, na.pad = T))

Using dplyr you group_by your ID variable, and then create a single new column with the rolling mean. You can plot this then with standard ggplot2-syntax:

ggplot(my_data, aes(Date, Count, group = 1)) +
geom_line(colour = "blue") +
geom_point(colour = "blue") +
geom_point(aes(y = roll_mean), colour = "red") +
facet_wrap(~ID)
#> Warning: Removed 3 rows containing missing values (geom_point).

Sample Image

Data

zzz <- "YYYYMM    Date         ID    Count
201401 01/01/2014 A 151
201401 01/01/2014 B 68
201401 01/01/2014 C 487
201401 02/01/2014 A 198
201401 02/01/2014 B 97
201401 02/01/2014 C 403"

my_data <- read_table(zzz)

Grouped moving average in r

Here is a rollapply solution. Note that it appears that you want the average of the prior two or three rows in the same group, i.e. excluding the data on the current row.

library(zoo)

roll <- function(x, n) {
if (length(x) <= n) NA
else rollapply(x, list(-seq(n)), mean, fill = NA)
}
transform(DF, AVG2 = ave(Score, school, Student, FUN = function(x) roll(x, 2)),
AVG3 = ave(Score, school, Student, FUN = function(x) roll(x, 3)))

giving:

   school Student Score AVG2     AVG3
1 I S 5 NA NA
2 B S 2 NA NA
3 B S 7 NA NA
4 B O 3 NA NA
5 B O 9 NA NA
6 I O 6 NA NA
7 I O 3 NA NA
8 I S 7 NA NA
9 I O 1 4.5 NA
10 B S 7 4.5 NA
11 I S 3 6.0 NA
12 I O 8 2.0 3.333333
13 B S 3 7.0 5.333333
14 I O 4 4.5 4.000000
15 B O 1 6.0 NA
16 I S 9 5.0 5.000000
17 B S 4 5.0 5.666667
18 B O 6 5.0 4.333333
19 I S 3 6.0 6.333333
20 I O 8 6.0 4.333333
21 B S 3 3.5 4.666667
22 I O 4 6.0 6.666667
23 B O 1 3.5 5.333333
24 I S 9 6.0 5.000000
25 B S 4 3.5 3.333333
26 B O 6 3.5 2.666667
27 I J 6 NA NA

Update: Fixed roll.

Calculating (something similar to) moving averages with grouped data in R?

There are several problems:

  1. By writing flights$ the code is telling it to override the grouping and use the original ungrouped vector. Remove flights$ .
  2. summarize is used when one row per group is desired but here it appears we want a result having the same number of rows as the input so use mutate rather than summarize.
  3. There are unneeded parentheses here and while they are not wrong it makes it harder to read. When expressions are potentially ambiguous or rely on rules the reader may have to look up it is a good idea to use extra parentheses but that is not the situation here.
  4. ungroup at the end so we are not left with a grouped data frame.
  5. dplyr clobbers lag and filter in base R so it will conflict with many other packages. Always exclude these in the library statement. This does not affect the code here since neither of those are used but as a precaution I always do that.
  6. Seems unnecessary to load all of the tidyverse when the code is only using dplyr and its dependencies.
library(dplyr, exclude = c("lag", "filter"))
library(nycflights13)
library(zoo)

delay_rate <- flights %>%
group_by(year, month, day) %>%
mutate(delay_rate = rollsumr(dep_delay, k = 7, fill = NA) /
rollsumr(arr_delay, k = 7, fill = NA)) %>%
ungroup

Moving average with grouped data

It's a little ambiguous, but I think you want this:

test <- cbind(time=rownames(test), test)  # first add a time variable

# then create a list with rolling mean for each id and time
ls1 <- lapply(seq_along(test$time),
function(x) cbind(time=x, # time variable
with(test[test$time %in% 1:x, ],
aggregate(list(VES_2A=VES_2A),
list(Index=Index), mean)) # rolling mean
))

tot <- transform(t(sapply(ls1, colMeans)), Index="total") # occasionally add a total column

long <- rbind(do.call(rbind, ls1), tot) # bind all rows together into long format data frame
wide <- reshape2::dcast(long, time ~ Index) # reshape to wide w/ e.g. reshape2::dcast()
rm(ls1, tot) # clean up

Yielding

> wide
time 1 2 3 5 6 total
1 1 1.00 NA NA NA NA 1.000000
2 2 1.00 NA NA NA NA 1.000000
3 3 1.00 NA NA NA NA 1.000000
4 4 0.75 NA NA NA NA 0.750000
5 5 0.75 3.000000 NA NA NA 1.875000
6 6 0.75 2.000000 NA NA NA 1.375000
7 7 0.75 1.666667 NA NA NA 1.208333
8 8 0.75 1.500000 NA NA NA 1.125000
9 9 0.75 1.500000 3.000000 NA NA 1.750000
10 10 0.75 1.500000 2.500000 NA NA 1.583333
11 11 0.75 1.500000 2.333333 NA NA 1.527778
12 12 0.75 1.500000 1.750000 NA NA 1.333333
13 13 0.75 1.500000 1.750000 2.000000 NA 1.500000
14 14 0.75 1.500000 1.750000 1.000000 NA 1.250000
15 15 0.75 1.500000 1.750000 1.333333 NA 1.333333
16 16 0.75 1.500000 1.750000 1.250000 NA 1.312500
17 17 0.75 1.500000 1.750000 1.250000 1.000000 1.250000
18 18 0.75 1.500000 1.750000 1.250000 1.500000 1.350000
19 19 0.75 1.500000 1.750000 1.250000 1.333333 1.316667
20 20 0.75 1.500000 1.750000 1.250000 1.250000 1.300000

Plot

library(ggplot2)
ggplot(long, aes(time, VES_2A, color=Index)) +
geom_line()

Sample Image

Tell me what you think, hope that's what you've wanted.

Data

test <- structure(list(VES_2A = c(1L, 1L, 1L, 0L, 3L, 1L, 1L, 1L, 3L, 
2L, 2L, 0L, 2L, 0L, 2L, 1L, 1L, 2L, 1L, 1L), Index = c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L)), class = "data.frame", row.names = c(NA, -20L))

Moving Averages on multiple columns - Grouped Data

We could use rollmean from the zoo package, in combination with data.table .

library(data.table)
library(zoo)
setDT(df)[, c("Qty.mean","To.mean") := lapply(.SD, rollmean, k = 3, fill = NA, align = "right"),
.SDcols = c("Qty","To"), by = Section]
> df
# Week Section Qty To Qty.mean To.mean
#1: 1 a 145.4814 73.49183 NA NA
#2: 2 a 348.9198 51.44893 NA NA
#3: 3 a 343.7099 50.67283 279.3703 58.53786
#4: 4 a 349.3518 47.46891 347.3271 49.86356
#5: 5 a 444.3662 49.28904 379.1426 49.14359
#6: 1 b 356.1242 52.66450 NA NA
#7: 2 b 103.7983 52.10773 NA NA
#8: 3 b 193.0202 46.36184 217.6476 50.37802
#9: 4 b 366.4335 41.59984 221.0840 46.68980
#10: 5 b 305.7005 48.75198 288.3847 45.57122
#11: 1 c 377.4365 72.42394 NA NA
#12: 2 c 317.9899 61.02790 NA NA
#13: 3 c 213.0934 76.58633 302.8400 70.01272
#14: 4 c 469.3734 73.25380 333.4856 70.28934
#15: 5 c 216.9263 41.83081 299.7977 63.89031


Related Topics



Leave a reply



Submit