Rolling mean (moving average) by group/id with dplyr
If you are not committed to to dplyr
this should work:
get.mav <- function(bp,n=2){
require(zoo)
if(is.na(bp[1])) bp[1] <- mean(bp,na.rm=TRUE)
bp <- na.locf(bp,na.rm=FALSE)
if(length(bp)<n) return(bp)
c(bp[1:(n-1)],rollapply(bp,width=n,mean,align="right"))
}
test <- with(test,test[order(ID,YEAR_VISIT),])
test$BLOOD_PRESSURE_UPDATED <-
unlist(aggregate(BLOOD_PRESSURE~ID,test,get.mav,na.action=NULL,n=2)$BLOOD_PRESSURE)
test
# ID AGE YEAR_VISIT BLOOD_PRESSURE TREATMENT BLOOD_PRESSURE_UPDATED
# 1 1 20 2000 NA 3 134.6667
# 2 1 21 2001 129 2 131.8333
# 3 1 22 2002 145 3 137.0000
# 4 1 22 2002 130 2 137.5000
# 5 2 23 2003 NA NA 130.0000
# 6 2 30 2010 150 2 140.0000
# 7 2 31 2011 110 3 130.0000
# ...
This works for moving averages > 2 as well.
And here's a data.table solution, which is likely to be much faster if your dataset is large.
library(data.table)
setDT(test) # converts test to a data.table in place
setkey(test,ID,YEAR_VISIT)
test[,BLOOD_PRESSURE_UPDATED:=as.numeric(get.mav(BLOOD_PRESSURE,2)),by=ID]
test
# ID AGE YEAR_VISIT BLOOD_PRESSURE TREATMENT BLOOD_PRESSURE_UPDATED
# 1: 1 20 2000 NA 3 134.6667
# 2: 1 21 2001 129 2 131.8333
# 3: 1 22 2002 145 3 137.0000
# 4: 1 22 2002 130 2 137.5000
# 5: 2 23 2003 NA NA 130.0000
# 6: 2 30 2010 150 2 140.0000
# 7: 2 31 2011 110 3 130.0000
# ...
7 Day Moving Average per group - R
You didn't supply enough data to create a weekly rolling mean within the groups, but in principle it could work like this:
library(tidyverse)
library(zoo)
my_data <- my_data %>%
group_by(ID) %>%
mutate(roll_mean = rollmean(Count, 2, na.pad = T))
Using dplyr
you group_by
your ID variable, and then create a single new column with the rolling mean. You can plot this then with standard ggplot2
-syntax:
ggplot(my_data, aes(Date, Count, group = 1)) +
geom_line(colour = "blue") +
geom_point(colour = "blue") +
geom_point(aes(y = roll_mean), colour = "red") +
facet_wrap(~ID)
#> Warning: Removed 3 rows containing missing values (geom_point).
Data
zzz <- "YYYYMM Date ID Count
201401 01/01/2014 A 151
201401 01/01/2014 B 68
201401 01/01/2014 C 487
201401 02/01/2014 A 198
201401 02/01/2014 B 97
201401 02/01/2014 C 403"
my_data <- read_table(zzz)
Grouped moving average in r
Here is a rollapply
solution. Note that it appears that you want the average of the prior two or three rows in the same group, i.e. excluding the data on the current row.
library(zoo)
roll <- function(x, n) {
if (length(x) <= n) NA
else rollapply(x, list(-seq(n)), mean, fill = NA)
}
transform(DF, AVG2 = ave(Score, school, Student, FUN = function(x) roll(x, 2)),
AVG3 = ave(Score, school, Student, FUN = function(x) roll(x, 3)))
giving:
school Student Score AVG2 AVG3
1 I S 5 NA NA
2 B S 2 NA NA
3 B S 7 NA NA
4 B O 3 NA NA
5 B O 9 NA NA
6 I O 6 NA NA
7 I O 3 NA NA
8 I S 7 NA NA
9 I O 1 4.5 NA
10 B S 7 4.5 NA
11 I S 3 6.0 NA
12 I O 8 2.0 3.333333
13 B S 3 7.0 5.333333
14 I O 4 4.5 4.000000
15 B O 1 6.0 NA
16 I S 9 5.0 5.000000
17 B S 4 5.0 5.666667
18 B O 6 5.0 4.333333
19 I S 3 6.0 6.333333
20 I O 8 6.0 4.333333
21 B S 3 3.5 4.666667
22 I O 4 6.0 6.666667
23 B O 1 3.5 5.333333
24 I S 9 6.0 5.000000
25 B S 4 3.5 3.333333
26 B O 6 3.5 2.666667
27 I J 6 NA NA
Update: Fixed roll.
Calculating (something similar to) moving averages with grouped data in R?
There are several problems:
- By writing
flights$
the code is telling it to override the grouping and use the original ungrouped vector. Removeflights$
. summarize
is used when one row per group is desired but here it appears we want a result having the same number of rows as the input so usemutate
rather thansummarize
.- There are unneeded parentheses here and while they are not wrong it makes it harder to read. When expressions are potentially ambiguous or rely on rules the reader may have to look up it is a good idea to use extra parentheses but that is not the situation here.
ungroup
at the end so we are not left with a grouped data frame.- dplyr clobbers
lag
andfilter
in base R so it will conflict with many other packages. Always exclude these in thelibrary
statement. This does not affect the code here since neither of those are used but as a precaution I always do that. - Seems unnecessary to load all of the tidyverse when the code is only using dplyr and its dependencies.
library(dplyr, exclude = c("lag", "filter"))
library(nycflights13)
library(zoo)
delay_rate <- flights %>%
group_by(year, month, day) %>%
mutate(delay_rate = rollsumr(dep_delay, k = 7, fill = NA) /
rollsumr(arr_delay, k = 7, fill = NA)) %>%
ungroup
Moving average with grouped data
It's a little ambiguous, but I think you want this:
test <- cbind(time=rownames(test), test) # first add a time variable
# then create a list with rolling mean for each id and time
ls1 <- lapply(seq_along(test$time),
function(x) cbind(time=x, # time variable
with(test[test$time %in% 1:x, ],
aggregate(list(VES_2A=VES_2A),
list(Index=Index), mean)) # rolling mean
))
tot <- transform(t(sapply(ls1, colMeans)), Index="total") # occasionally add a total column
long <- rbind(do.call(rbind, ls1), tot) # bind all rows together into long format data frame
wide <- reshape2::dcast(long, time ~ Index) # reshape to wide w/ e.g. reshape2::dcast()
rm(ls1, tot) # clean up
Yielding
> wide
time 1 2 3 5 6 total
1 1 1.00 NA NA NA NA 1.000000
2 2 1.00 NA NA NA NA 1.000000
3 3 1.00 NA NA NA NA 1.000000
4 4 0.75 NA NA NA NA 0.750000
5 5 0.75 3.000000 NA NA NA 1.875000
6 6 0.75 2.000000 NA NA NA 1.375000
7 7 0.75 1.666667 NA NA NA 1.208333
8 8 0.75 1.500000 NA NA NA 1.125000
9 9 0.75 1.500000 3.000000 NA NA 1.750000
10 10 0.75 1.500000 2.500000 NA NA 1.583333
11 11 0.75 1.500000 2.333333 NA NA 1.527778
12 12 0.75 1.500000 1.750000 NA NA 1.333333
13 13 0.75 1.500000 1.750000 2.000000 NA 1.500000
14 14 0.75 1.500000 1.750000 1.000000 NA 1.250000
15 15 0.75 1.500000 1.750000 1.333333 NA 1.333333
16 16 0.75 1.500000 1.750000 1.250000 NA 1.312500
17 17 0.75 1.500000 1.750000 1.250000 1.000000 1.250000
18 18 0.75 1.500000 1.750000 1.250000 1.500000 1.350000
19 19 0.75 1.500000 1.750000 1.250000 1.333333 1.316667
20 20 0.75 1.500000 1.750000 1.250000 1.250000 1.300000
Plot
library(ggplot2)
ggplot(long, aes(time, VES_2A, color=Index)) +
geom_line()
Tell me what you think, hope that's what you've wanted.
Data
test <- structure(list(VES_2A = c(1L, 1L, 1L, 0L, 3L, 1L, 1L, 1L, 3L,
2L, 2L, 0L, 2L, 0L, 2L, 1L, 1L, 2L, 1L, 1L), Index = c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L)), class = "data.frame", row.names = c(NA, -20L))
Moving Averages on multiple columns - Grouped Data
We could use rollmean
from the zoo
package, in combination with data.table
.
library(data.table)
library(zoo)
setDT(df)[, c("Qty.mean","To.mean") := lapply(.SD, rollmean, k = 3, fill = NA, align = "right"),
.SDcols = c("Qty","To"), by = Section]
> df
# Week Section Qty To Qty.mean To.mean
#1: 1 a 145.4814 73.49183 NA NA
#2: 2 a 348.9198 51.44893 NA NA
#3: 3 a 343.7099 50.67283 279.3703 58.53786
#4: 4 a 349.3518 47.46891 347.3271 49.86356
#5: 5 a 444.3662 49.28904 379.1426 49.14359
#6: 1 b 356.1242 52.66450 NA NA
#7: 2 b 103.7983 52.10773 NA NA
#8: 3 b 193.0202 46.36184 217.6476 50.37802
#9: 4 b 366.4335 41.59984 221.0840 46.68980
#10: 5 b 305.7005 48.75198 288.3847 45.57122
#11: 1 c 377.4365 72.42394 NA NA
#12: 2 c 317.9899 61.02790 NA NA
#13: 3 c 213.0934 76.58633 302.8400 70.01272
#14: 4 c 469.3734 73.25380 333.4856 70.28934
#15: 5 c 216.9263 41.83081 299.7977 63.89031
Related Topics
Extract English Words from a Text in R
How to Highlight Area Between Two Lines? Ggplot
Ggplot: How to Produce a Gradient Fill Within a Geom_Polygon
Create a New Column with Non-Null Columns' Names
"Non-Finite Function Value" When Using Integrate() in R
How to Add a Legend for the Secondary Axis Ggplot
R Function That Uses Its Output as Its Own Input Repeatedly
Replace Na with Grouped Means in R
How to Convert a Numeric Value into a Date Value
Does Calculating Correlation Between Two Dataframes Require a Loop
Using Tidy Eval for Multiple Dplyr Filter Conditions
Ggplot Scale_X_Continuous with Symbol: Make Bold
Plot.Lm Error: $ Operator Is Invalid for Atomic Vectors
Changes in Plotting an Xts Object