calculating mean for every n values from a vector
I would use
colMeans(matrix(a, 60))
.colMeans(a, 60, length(a) / 60) # more efficient (without reshaping to matrix)
Enhancement on user adunaic's request
This only works if there are 60x100 data points. If you have an incomplete 60 at the end then this errors. It would be good to have a general solution for others looking at this problem for ideas.
BinMean <- function (vec, every, na.rm = FALSE) {
n <- length(vec)
x <- .colMeans(vec, every, n %/% every, na.rm)
r <- n %% every
if (r) x <- c(x, mean.default(vec[(n - r + 1):n], na.rm = na.rm))
x
}
a <- 1:103
BinMean(a, every = 10)
# [1] 5.5 15.5 25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5 102.0
Alternative solution with group-by operation (less efficient)
BinMean2 <- function (vec, every, na.rm = FALSE) {
grp <- as.integer(ceiling(seq_along(vec) / every))
grp <- structure(grp, class = "factor",
levels = as.character(seq_len(grp[length(grp)])) )
lst <- .Internal(split(vec, grp))
unlist(lapply(lst, mean.default, na.rm = na.rm), use.names = FALSE)
}
Speed
library(microbenchmark)
a <- runif(1e+4)
microbenchmark(BinMean(a, 100), BinMean2(a, 100))
#Unit: microseconds
# expr min lq mean median uq max
# BinMean(a, 100) 40.400 42.1095 54.21286 48.3915 57.6555 205.702
# BinMean2(a, 100) 1216.823 1335.7920 1758.90267 1434.9090 1563.1535 21467.542
Calculate mean of every nth element
Another possible solution, using base R:
rowMeans(matrix(my.vec, 24, 31))
#> [1] -0.9354839 -0.3548387 -1.0322581 2.5161290 2.1290323 0.7419355
#> [7] 1.3870968 1.4838710 0.9032258 -1.9032258 4.2903226 -0.4193548
#> [13] -1.9354839 -3.1935484 -2.1935484 2.0322581 0.2580645 2.4193548
#> [19] 0.8064516 0.8064516 5.0645161 -0.5806452 -1.2580645 -0.1290323
R: calculating mean for every n different values from a vector
x <- cumsum(x)
x <- c(1,x)
for(i in 1:(length(x)+1)){
print(mean(y[x[i-]:x[i+1]]))
How to calculate the mean for every n vectors from a df
Here is base R option
n <- 2 # Mean across every n = 2 columns
do.call(cbind, lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx + 1)])))
# [,1] [,2] [,3]
#[1,] 4 16 28
#[2,] 5 17 29
#[3,] 6 18 30
#[4,] 7 19 31
#[5,] 8 20 32
#[6,] 9 21 33
This returns a matrix
rather than a data.frame
(which makes more sense here since you're dealing with "all-numeric" data).
Explanation: The idea is a non-overlapping sliding window approach. seq(1, ncol(df), by = n)
creates the start indices of the columns (here: 1, 3, 5). We then loop over those indices idx
and calculate the row means of df[c(idx, idx + 1)]
. This returns a list
which we then cbind
into a matrix
.
As a minor modifcation, you can also predefine a data.frame
with the right dimensions and then skip the do.call(cbind, ...)
step by having R do an implicit list
to data.frame
typecast.
out <- data.frame(matrix(NA, ncol = ncol(df) / 2, nrow = nrow(df)))
out[] <- lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx + 1)]))
# X1 X2 X3
#1 4 16 28
#2 5 17 29
#3 6 18 30
#4 7 19 31
#5 8 20 32
#6 9 21 33
Get the average every 10 steps in a vector in R
colMeans(matrix(x, 10))
[1] 0.4 0.7 0.8 0.2 0.0 0.4 -0.4 -0.4 -0.7 0.1
We turn the vector into a matrix with the dimensions matching your desired length and use colMeans
to find the mean of each group. We could have also used rowMeans
, but since the matrix is populated column-wise by default we would have to add another argument byrow=TRUE
and potentially hurt ourselves with all of the extra typing.
We can test our answer by explicitly finding the mean of a few of the subsetted vectors.
#Test
mean(x[1:10])
[1] 0.4
mean(x[11:20])
[1] 0.7
Data
x <- c(0, 1, 0, -1, 0, 0, 0, 2, 2, 0, -1, 2, 4, 0, 0, -1, 0, 0, 1,
2, 4, 0, 1, 0, 0, 0, -2, 3, 1, 1, 0, 1, 0, 0, 0, 1, -1, 1, 0,
0, 1, 0, 1, 1, -1, -1, -2, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1,
-1, -1, -1, 0, 0, 0, -2, 0, 0, 0, 0, 0, 0, 0, 0, -1, 1, -1, -1,
-2, 0, -2, -3, -2, -1, 0, 0, 2, 0, 0, -1, 0, 0, 0, -1, 0, -1,
1, 1, 0, 1)
Creating the mean average of every nth object in a specific column of a dataframe
Just use a combination of rowMeans
and subsetting. So something like:
n = 5
rowMeans(data[seq(1, nrow(data), n),])
Alternatively, you could use apply
## rowMeans is better, but
## if you wanted to calculate the median (say)
## Just change mean to median below
apply(data[seq(1, nrow(data), n),], 1, mean)
Related Topics
How Subset a Data Frame by a Factor and Repeat a Plot for Each Subset
How to Define Fixed Aspect-Ratio for (Base R) Scatter-Plot
R - Converting Date and Time Fields to Posixct with Hhmmss Format
Cowplot Made Ggplot2 Theme Disappear/How to See Current Ggplot2 Theme, and Restore the Default
Add an Index (Numeric Id) Column to Large Data Frame
Similarity Scores Based on String Comparison in R (Edit Distance)
Cluster One-Dimensional Data Optimally
Is There a Vectorized Parallel Max() and Min()
How to Move or Position a Legend in Ggplot2
Print Unicode Character String in R
Convert String to Date, Format: "Dd.Mm.Yyyy"
Long Numbers as a Character String
How to Reorder Data.Table Columns (Without Copying)
How to Show the Y Value on Tooltip While Hover in Ggplot2