Calculating Mean for Every N Values from a Vector

calculating mean for every n values from a vector

I would use

 colMeans(matrix(a, 60))
.colMeans(a, 60, length(a) / 60) # more efficient (without reshaping to matrix)

Enhancement on user adunaic's request

This only works if there are 60x100 data points. If you have an incomplete 60 at the end then this errors. It would be good to have a general solution for others looking at this problem for ideas.

BinMean <- function (vec, every, na.rm = FALSE) {
n <- length(vec)
x <- .colMeans(vec, every, n %/% every, na.rm)
r <- n %% every
if (r) x <- c(x, mean.default(vec[(n - r + 1):n], na.rm = na.rm))
x
}

a <- 1:103
BinMean(a, every = 10)
# [1] 5.5 15.5 25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5 102.0

Alternative solution with group-by operation (less efficient)

BinMean2 <- function (vec, every, na.rm = FALSE) {
grp <- as.integer(ceiling(seq_along(vec) / every))
grp <- structure(grp, class = "factor",
levels = as.character(seq_len(grp[length(grp)])) )
lst <- .Internal(split(vec, grp))
unlist(lapply(lst, mean.default, na.rm = na.rm), use.names = FALSE)
}

Speed

library(microbenchmark)
a <- runif(1e+4)
microbenchmark(BinMean(a, 100), BinMean2(a, 100))
#Unit: microseconds
# expr min lq mean median uq max
# BinMean(a, 100) 40.400 42.1095 54.21286 48.3915 57.6555 205.702
# BinMean2(a, 100) 1216.823 1335.7920 1758.90267 1434.9090 1563.1535 21467.542

Calculate mean of every nth element

Another possible solution, using base R:

rowMeans(matrix(my.vec, 24, 31))

#> [1] -0.9354839 -0.3548387 -1.0322581 2.5161290 2.1290323 0.7419355
#> [7] 1.3870968 1.4838710 0.9032258 -1.9032258 4.2903226 -0.4193548
#> [13] -1.9354839 -3.1935484 -2.1935484 2.0322581 0.2580645 2.4193548
#> [19] 0.8064516 0.8064516 5.0645161 -0.5806452 -1.2580645 -0.1290323

R: calculating mean for every n different values from a vector


x <- cumsum(x)
x <- c(1,x)
for(i in 1:(length(x)+1)){
print(mean(y[x[i-]:x[i+1]]))

How to calculate the mean for every n vectors from a df

Here is base R option

n <- 2 # Mean across every n = 2 columns
do.call(cbind, lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx + 1)])))
# [,1] [,2] [,3]
#[1,] 4 16 28
#[2,] 5 17 29
#[3,] 6 18 30
#[4,] 7 19 31
#[5,] 8 20 32
#[6,] 9 21 33

This returns a matrix rather than a data.frame (which makes more sense here since you're dealing with "all-numeric" data).

Explanation: The idea is a non-overlapping sliding window approach. seq(1, ncol(df), by = n) creates the start indices of the columns (here: 1, 3, 5). We then loop over those indices idx and calculate the row means of df[c(idx, idx + 1)]. This returns a list which we then cbind into a matrix.


As a minor modifcation, you can also predefine a data.frame with the right dimensions and then skip the do.call(cbind, ...) step by having R do an implicit list to data.frame typecast.

out <- data.frame(matrix(NA, ncol = ncol(df) / 2, nrow = nrow(df)))  
out[] <- lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx + 1)]))
# X1 X2 X3
#1 4 16 28
#2 5 17 29
#3 6 18 30
#4 7 19 31
#5 8 20 32
#6 9 21 33

Get the average every 10 steps in a vector in R


colMeans(matrix(x, 10))
[1] 0.4 0.7 0.8 0.2 0.0 0.4 -0.4 -0.4 -0.7 0.1

We turn the vector into a matrix with the dimensions matching your desired length and use colMeans to find the mean of each group. We could have also used rowMeans, but since the matrix is populated column-wise by default we would have to add another argument byrow=TRUE and potentially hurt ourselves with all of the extra typing.

We can test our answer by explicitly finding the mean of a few of the subsetted vectors.

#Test
mean(x[1:10])
[1] 0.4
mean(x[11:20])
[1] 0.7

Data

x <- c(0, 1, 0, -1, 0, 0, 0, 2, 2, 0, -1, 2, 4, 0, 0, -1, 0, 0, 1, 
2, 4, 0, 1, 0, 0, 0, -2, 3, 1, 1, 0, 1, 0, 0, 0, 1, -1, 1, 0,
0, 1, 0, 1, 1, -1, -1, -2, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1,
-1, -1, -1, 0, 0, 0, -2, 0, 0, 0, 0, 0, 0, 0, 0, -1, 1, -1, -1,
-2, 0, -2, -3, -2, -1, 0, 0, 2, 0, 0, -1, 0, 0, 0, -1, 0, -1,
1, 1, 0, 1)

Creating the mean average of every nth object in a specific column of a dataframe

Just use a combination of rowMeans and subsetting. So something like:

n = 5
rowMeans(data[seq(1, nrow(data), n),])

Alternatively, you could use apply

## rowMeans is better, but 
## if you wanted to calculate the median (say)
## Just change mean to median below
apply(data[seq(1, nrow(data), n),], 1, mean)


Related Topics



Leave a reply



Submit