Moving Variance in R

Moving variance in R

Consider the zoo package. For example filter() gives:

> filter(1:100, rep(1/3,3))
Time Series:
Start = 1
End = 100
Frequency = 1
[1] NA 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
[76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 NA

whereas rollmean() in zoo gives:

> rollmean(1:100, k = 3, na.pad = TRUE)
[1] NA 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
[76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 NA

which is the same (for a 3 point moving average in this example).

Whilst zoo doesn't have a rollsd() or rollvar() it does have rollapply(), which works like the apply() functions to apply any R function to the specified window.

> rollapply(1:100, width = 3, FUN = sd, na.pad = TRUE)
[1] NA 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[26] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[51] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[76] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 NA
Warning message:
In rollapply.zoo(zoo(data), ...) : na.pad argument is deprecated

or on something more interesting:

> rollapply(vec, width = 3, FUN = sd, na.pad = TRUE)
[1] NA 0.3655067 0.8472871 0.5660495 0.3491970 0.4732417 0.9236859
[8] 0.8075226 1.8725851 1.1930784 0.6329325 1.1412416 0.8430772 0.5808005
[15] 0.3838545 1.1738170 1.1655400 1.3241700 0.6876834 0.1534157 0.4858477
[22] 0.9843506 0.6002713 0.6897541 2.0619563 2.5675788 6.3522039 6.0066864
[29] 6.2618432 5.1704866 2.1360853 2.5602557 1.0408528 1.0316396 4.9441628
[36] 5.0319314 5.7589716 3.2425000 4.8788158 2.0847286 4.5199291 2.5323486
[43] 2.1987149 1.8393000 1.2278639 1.5998965 1.5341485 4.4287108 4.4159166
[50] 4.3224546 3.6959067 4.9826264 5.3134044 8.4084322 9.1249234 7.5506725
[57] 3.8499136 3.9680487 5.6362296 4.9124095 4.3452706 4.0227141 4.5867559
[64] 4.7350394 4.3203807 4.4506799 7.2759499 7.6536424 7.8487654 2.0905576
[71] 4.0056880 5.6209853 1.5551659 1.3615268 2.8469458 2.8323588 1.9848578
[78] 1.1201124 1.4248380 1.7802571 1.4281773 2.5481935 1.8554451 1.0925410
[85] 2.1823722 2.2788755 2.4205378 2.0733741 0.7462248 1.3873578 1.4265948
[92] 0.7212619 0.7425993 1.0696432 2.4520585 3.0555819 3.1000885 1.0945292
[99] 0.3726928 NA
Warning message:
In rollapply.zoo(zoo(data), ...) : na.pad argument is deprecated

You can get rid of the warning by using the fill = NA argument, as in

> rollapply(vec, width = 3, FUN = sd, fill = NA)

Compute the variance of a moving window in a dataframe

We can use rollappyr to perform the rolling computations. Since there are only 11 rows in the data in the question we can't take 12 month averages but using 3 month averages instead we can illustrate it. Remove fill = NA if you want to omit the NA rows or replace it with partial = TRUE if you want variances using fewer than 12 near the beginning. If you want a data frame result use fortify.zoo(zv) .

library(zoo)

z <- read.zoo(DF)
zv <- rollapplyr(z, 3, var, fill = NA)
zv

giving this zoo object:

           USA          ARG          BRA          CHL          COL          MEX PER
2012-04-01 NA NA NA NA NA NA NA
2012-05-01 NA NA NA NA NA NA NA
2012-06-01 0 1.287083e-04 4.998008e-04 1.126781e-09 1.237524e-11 5.208793e-06 NA
2012-07-01 0 1.033001e-04 5.217420e-05 9.109406e-10 3.883996e-12 3.565057e-06 NA
2012-08-01 0 9.358558e-06 1.396497e-05 2.060928e-09 4.221043e-12 4.600220e-06 NA
2012-09-01 0 1.113297e-05 3.108380e-08 9.159058e-10 4.826929e-12 7.453672e-07 NA
2012-10-01 0 1.988357e-06 4.498977e-08 2.485889e-10 2.953403e-12 8.001948e-07 NA
2012-11-01 0 3.560373e-06 1.944961e-05 2.615387e-10 1.168389e-11 2.971477e-07 NA
2012-12-01 0 3.717777e-05 2.655440e-05 1.271886e-10 1.814869e-11 4.312436e-07 NA
2013-01-01 0 2.042867e-05 3.268476e-05 2.806455e-10 7.540331e-11 1.231438e-06 NA
2013-02-01 0 4.134729e-07 1.129013e-04 1.186146e-10 1.983651e-11 3.263780e-07 NA

We can plot the log of the variances like this:

library(ggplot2)
autoplot(log(zv), facet = NULL) + geom_point() + ylab("log(var(.))")

screenshot

Note

We assume that the starting point is the data frame generated reproducibly below:

Lines <- "Date       USA       ARG       BRA         CHL          COL        MEX PER
2012-04-01 1 0.2271531 0.4970299 0.001956865 0.0005341452 0.07341428 NA
2012-05-01 1 0.2218906 0.4675895 0.001911405 0.0005273186 0.07026524 NA
2012-06-01 1 0.2054076 0.4531661 0.001891352 0.0005292575 0.06897811 NA
2012-07-01 1 0.2033470 0.4596730 0.001950686 0.0005312600 0.07269619 NA
2012-08-01 1 0.1993882 0.4596039 0.001980537 0.0005271514 0.07268987 NA
2012-09-01 1 0.1967152 0.4593390 0.002011212 0.0005305549 0.07418838 NA
2012-10-01 1 0.1972730 0.4597584 0.002002203 0.0005284380 0.07428555 NA
2012-11-01 1 0.1937618 0.4519187 0.001979805 0.0005238670 0.07329656 NA
2012-12-01 1 0.1854037 0.4500448 0.001993309 0.0005323795 0.07453949 NA
2013-01-01 1 0.1866007 0.4607501 0.002013112 0.0005412329 0.07551040 NA
2013-02-01 1 0.1855950 0.4712956 0.002011067 0.0005359562 0.07554661 NA"
DF <- read.table(text = Lines, header = TRUE)

Find the variance over a sliding window in dplyr

Why not use rollapply from zoo?:

library(dplyr)

library(zoo)
df %>% group_by(a) %>%
mutate(var = rollapply(x, 3, var, fill = NA, align = "right"))

Questionnable computation of Variance in R

R computes the unbiased variance (https://en.wikipedia.org/wiki/Variance#Unbiased_sample_variance), which is the same but divided by n-1 = 3 instead of n = 4.



Related Topics



Leave a reply



Submit