Calculate Row Means on Subset of Columns

How to calculate row mean from selected columns

Just subset each row by their means in respective rows w before calculating their means.

w <- c("01-01-2018", "02-01-2018", "03-01-2018")  ## define columns

apply(data[, w], 1, function(x) mean(x[x > mean(x)]))
# [1] 3.40 2.75 4.90 -0.10 1.15

Another way is to replace data points that don't exceed the row means with NA's before calculating rowMeans. This is about 30 times faster.

rowMeans(replace(data, data <= rowMeans(data[, w]), NA), na.rm=TRUE)
# [1] 3.40 2.75 4.90 -0.10 1.15

Data:

data <- structure(list(`01-01-2018` = c(1.2, 3.1, 0.7, -0.3, 2), `02-01-2018` = c(-0.1, 
2.4, 4.9, -3.3, -2.7), `03-01-2018` = c(3.4, -2.6, -1.8, 0.1,
0.3)), class = "data.frame", row.names = c(NA, -5L))

R tidy row means from subset of columns

In my previous version I thought that rowMeans is the concern, but actually what is slowing down the calculation is the usage of select - better just stick with the grep family:

df %>% mutate(A = rowMeans(.[, grepl("^A", names(.))]))

Calculate row means on subset of columns

Calculate row means on a subset of columns:

Create a new data.frame which specifies the first column from DF as an column called ID and calculates the mean of all the other fields on that row, and puts that into column entitled 'Means':

data.frame(ID=DF[,1], Means=rowMeans(DF[,-1]))
ID Means
1 A 3.666667
2 B 4.333333
3 C 3.333333
4 D 4.666667
5 E 4.333333

How to calculate rowMeans of columns with similar colnames in r?

We can iterate over unique names, subset them from original dataframe and take rowMeans.

sapply(c("A", "B"), function(x) rowMeans(df[,colnames(df) == x]))

# A B
#[1,] 2 6.67
#[2,] 3 7.00

Issue with calculating row mean in data table for selected columns in R

Ok so you're doing a couple of things wrong. First, rowMeans can't evaluate a character vector, if you want to select columns by using it you must use .SD and pass the character vector to .SDcols. Second, you're trying to calculate a row aggregation and grouping, which I don't think makes much sense. Third, even if your expression didn't throw an error, you are assigning it back to Table, which would destroy your original data (if you want to add a new column use := to add it by reference).

What you want to do is calculate the row means of your selected columns, which you can do like this:

Table[, AvgGM := rowMeans(.SD), .SDcols = sel_cols_GM] 
Table[, AvgPM := rowMeans(.SD), .SDcols = sel_cols_PM]

This means create these new columns as the row means of my subset of data (.SD) which refers to these columns (.SDcols)

calculating row means for only rows that have more than one data point in R

You could make a function that applies a mean to a row based on some condition. In your example, if there are two or more valid measurements, calculate mean.

a <- c(1,0,NA,1,NA,0,1,0,NA,0,NA)
b <- c(1,0,NA,1,0,1,1,1,NA,0,1)
c <- c(1,NA,NA,0,NA,0,1,1,1,0,0)
mydata <- data.frame(a,b,c)

Reading functions is best done from inside out. This one will take a vector x and see how many are not NA. When it sums (sum) the TRUE/FALSE values it turns them beforehand to 1 and 0, respectively. It then performs a test if there are more than 1 (so 2 or more) values - that are not NA.

conditionalMean <- function(x) {
if (sum(!is.na(x)) > 1) {
mean(x, na.rm = TRUE)
} else {
NA
}
}

We apply this function to your data.frame row-wise, as denoted by MARGIN = 1. If you had a function that worked column-wise, you would use MARGIN = 2. You can try it out. Compare apply(mydata, MARGIN = 2, FUN = mean, na.rm = TRUE) and colMeans(mydata, na.rm = TRUE).

apply(mydata, MARGIN = 1, FUN = conditionalMean)

[1] 1.0000000 0.0000000 NA 0.6666667 NA 0.3333333 1.0000000
[8] 0.6666667 NA 0.0000000 0.5000000

Calculate row means on specific columns

You can use aggregate

aggregate(Reading~Sample,data=yourdata, mean)

Aggregate the mean 2 columns to become one column for each row

You can use apply:

Data:

df <- data.frame(
Time = c(-200, -1.34, 0.536),
"1a" = c(-0.02, -0.003, 0.057),
"1b" = c(-0.006, -0.04, 0.0235)
)

Solution:

df$mean <- apply(df[-1], 1, mean)

Result:

df
Time X1a X1b mean
1 -200.000 -0.020 -0.0060 -0.01300
2 -1.340 -0.003 -0.0400 -0.02150
3 0.536 0.057 0.0235 0.04025

Alternatively, as suggested by @jay.sf, use rowMeans, which is faster in terms of execution:

rowMeans(df[2:3])
[1] -0.01300 -0.02150 0.04025

Calculate row means on subset of columns selected via external rank

For the first question,
you could get the mean of the first 2 non NA values per row using apply:

df$BestAvg = apply(df,1,function(x) mean(x[!is.na(x)][1:2]))

In the case that the ranking of coders is actually CoderD > CoderB > CoderC > CoderA:

r = c("CoderD", "CoderB", "CoderC", "CoderA")
df$BestAvg2 = apply(df,1,function(x) mean(x[r][!is.na(x[r])][1:2]))

This returns:

     CoderA CoderB CoderC CoderD BestAvg BestAvg2
1 2 1 NA 1 1.5 1.0
2 1 3 3 NA 2.0 3.0
3 NA NA 4 5 4.5 4.5
4 7 6 7 6 6.5 6.0
5 3 3 4 2 3.0 2.5
6 2 2 NA NA 2.0 2.0
7 2 NA 2 1 2.0 1.5
8 5 3 NA 4 4.0 3.5
9 7 7 6 NA 7.0 6.5
10 1 NA 3 4 2.0 3.5


Related Topics



Leave a reply



Submit