﻿ Calculate Row Means on Subset of Columns - ITCodar

# Calculate Row Means on Subset of Columns

## How to calculate row mean from selected columns

Just subset each row by their means in respective rows `w` before calculating their means.

``w <- c("01-01-2018", "02-01-2018", "03-01-2018")  ## define columnsapply(data[, w], 1, function(x) mean(x[x > mean(x)]))# [1]  3.40  2.75  4.90 -0.10  1.15``

Another way is to `replace` data points that don't exceed the row means with `NA's` before calculating `rowMeans`. This is about 30 times faster.

``rowMeans(replace(data, data <= rowMeans(data[, w]), NA), na.rm=TRUE)# [1]  3.40  2.75  4.90 -0.10  1.15``

Data:

``data <- structure(list(`01-01-2018` = c(1.2, 3.1, 0.7, -0.3, 2), `02-01-2018` = c(-0.1, 2.4, 4.9, -3.3, -2.7), `03-01-2018` = c(3.4, -2.6, -1.8, 0.1, 0.3)), class = "data.frame", row.names = c(NA, -5L))``

## R tidy row means from subset of columns

In my previous version I thought that `rowMeans` is the concern, but actually what is slowing down the calculation is the usage of `select` - better just stick with the `grep` family:

``df %>% mutate(A = rowMeans(.[, grepl("^A", names(.))]))``

## Calculate row means on subset of columns

Calculate row means on a subset of columns:

Create a new data.frame which specifies the first column from DF as an column called ID and calculates the mean of all the other fields on that row, and puts that into column entitled 'Means':

``data.frame(ID=DF[,1], Means=rowMeans(DF[,-1]))  ID    Means1  A 3.6666672  B 4.3333333  C 3.3333334  D 4.6666675  E 4.333333``

## How to calculate rowMeans of columns with similar colnames in r?

We can iterate over unique names, subset them from original dataframe and take `rowMeans`.

``sapply(c("A", "B"), function(x) rowMeans(df[,colnames(df) == x]))#     A    B#[1,] 2 6.67#[2,] 3 7.00``

## Issue with calculating row mean in data table for selected columns in R

Ok so you're doing a couple of things wrong. First, `rowMeans` can't evaluate a character vector, if you want to select columns by using it you must use `.SD` and pass the character vector to `.SDcols`. Second, you're trying to calculate a row aggregation and grouping, which I don't think makes much sense. Third, even if your expression didn't throw an error, you are assigning it back to `Table`, which would destroy your original data (if you want to add a new column use `:=` to add it by reference).

What you want to do is calculate the row means of your selected columns, which you can do like this:

``Table[, AvgGM := rowMeans(.SD), .SDcols = sel_cols_GM] Table[, AvgPM := rowMeans(.SD), .SDcols = sel_cols_PM]``

This means create these new columns as the row means of my subset of data (`.SD`) which refers to these columns (`.SDcols`)

## calculating row means for only rows that have more than one data point in R

You could make a function that applies a mean to a row based on some condition. In your example, if there are two or more valid measurements, calculate mean.

``a <- c(1,0,NA,1,NA,0,1,0,NA,0,NA)b <- c(1,0,NA,1,0,1,1,1,NA,0,1)c <- c(1,NA,NA,0,NA,0,1,1,1,0,0)mydata <- data.frame(a,b,c)``

Reading functions is best done from inside out. This one will take a vector `x` and see how many are not NA. When it sums (`sum`) the TRUE/FALSE values it turns them beforehand to 1 and 0, respectively. It then performs a test if there are more than 1 (so 2 or more) values - that are not NA.

``conditionalMean <- function(x) {  if (sum(!is.na(x)) > 1) {    mean(x, na.rm = TRUE)  } else {    NA  }}``

We apply this function to your `data.frame` row-wise, as denoted by `MARGIN = 1`. If you had a function that worked column-wise, you would use `MARGIN = 2`. You can try it out. Compare `apply(mydata, MARGIN = 2, FUN = mean, na.rm = TRUE)` and `colMeans(mydata, na.rm = TRUE)`.

``apply(mydata, MARGIN = 1, FUN = conditionalMean) [1] 1.0000000 0.0000000        NA 0.6666667        NA 0.3333333 1.0000000 [8] 0.6666667        NA 0.0000000 0.5000000``

## Calculate row means on specific columns

You can use `aggregate`

``aggregate(Reading~Sample,data=yourdata, mean)``

## Aggregate the mean 2 columns to become one column for each row

You can use `apply`:

Data:

``df <- data.frame(  Time = c(-200, -1.34, 0.536),  "1a" = c(-0.02, -0.003, 0.057),  "1b" = c(-0.006, -0.04, 0.0235))``

Solution:

``df\$mean <- apply(df[-1], 1, mean)``

Result:

``df      Time    X1a     X1b     mean1 -200.000 -0.020 -0.0060 -0.013002   -1.340 -0.003 -0.0400 -0.021503    0.536  0.057  0.0235  0.04025``

Alternatively, as suggested by @jay.sf, use `rowMeans`, which is faster in terms of execution:

``rowMeans(df[2:3])[1] -0.01300 -0.02150  0.04025``

## Calculate row means on subset of columns selected via external rank

For the first question,
you could get the mean of the first 2 non NA values per row using `apply`:

``df\$BestAvg = apply(df,1,function(x) mean(x[!is.na(x)][1:2]))``

In the case that the ranking of coders is actually `CoderD > CoderB > CoderC > CoderA`:

``r = c("CoderD", "CoderB", "CoderC", "CoderA")df\$BestAvg2 = apply(df,1,function(x) mean(x[r][!is.na(x[r])][1:2]))``

This returns:

``     CoderA CoderB CoderC CoderD BestAvg BestAvg21       2      1     NA      1     1.5      1.02       1      3      3     NA     2.0      3.03      NA     NA      4      5     4.5      4.54       7      6      7      6     6.5      6.05       3      3      4      2     3.0      2.56       2      2     NA     NA     2.0      2.07       2     NA      2      1     2.0      1.58       5      3     NA      4     4.0      3.59       7      7      6     NA     7.0      6.510      1     NA      3      4     2.0      3.5``