How to calculate row mean from selected columns
Just subset each row by their means in respective rows w
before calculating their means.
w <- c("01-01-2018", "02-01-2018", "03-01-2018") ## define columns
apply(data[, w], 1, function(x) mean(x[x > mean(x)]))
# [1] 3.40 2.75 4.90 -0.10 1.15
Another way is to replace
data points that don't exceed the row means with NA's
before calculating rowMeans
. This is about 30 times faster.
rowMeans(replace(data, data <= rowMeans(data[, w]), NA), na.rm=TRUE)
# [1] 3.40 2.75 4.90 -0.10 1.15
Data:
data <- structure(list(`01-01-2018` = c(1.2, 3.1, 0.7, -0.3, 2), `02-01-2018` = c(-0.1,
2.4, 4.9, -3.3, -2.7), `03-01-2018` = c(3.4, -2.6, -1.8, 0.1,
0.3)), class = "data.frame", row.names = c(NA, -5L))
R tidy row means from subset of columns
In my previous version I thought that rowMeans
is the concern, but actually what is slowing down the calculation is the usage of select
- better just stick with the grep
family:
df %>% mutate(A = rowMeans(.[, grepl("^A", names(.))]))
Calculate row means on subset of columns
Calculate row means on a subset of columns:
Create a new data.frame which specifies the first column from DF as an column called ID and calculates the mean of all the other fields on that row, and puts that into column entitled 'Means':
data.frame(ID=DF[,1], Means=rowMeans(DF[,-1]))
ID Means
1 A 3.666667
2 B 4.333333
3 C 3.333333
4 D 4.666667
5 E 4.333333
How to calculate rowMeans of columns with similar colnames in r?
We can iterate over unique names, subset them from original dataframe and take rowMeans
.
sapply(c("A", "B"), function(x) rowMeans(df[,colnames(df) == x]))
# A B
#[1,] 2 6.67
#[2,] 3 7.00
Issue with calculating row mean in data table for selected columns in R
Ok so you're doing a couple of things wrong. First, rowMeans
can't evaluate a character vector, if you want to select columns by using it you must use .SD
and pass the character vector to .SDcols
. Second, you're trying to calculate a row aggregation and grouping, which I don't think makes much sense. Third, even if your expression didn't throw an error, you are assigning it back to Table
, which would destroy your original data (if you want to add a new column use :=
to add it by reference).
What you want to do is calculate the row means of your selected columns, which you can do like this:
Table[, AvgGM := rowMeans(.SD), .SDcols = sel_cols_GM]
Table[, AvgPM := rowMeans(.SD), .SDcols = sel_cols_PM]
This means create these new columns as the row means of my subset of data (.SD
) which refers to these columns (.SDcols
)
calculating row means for only rows that have more than one data point in R
You could make a function that applies a mean to a row based on some condition. In your example, if there are two or more valid measurements, calculate mean.
a <- c(1,0,NA,1,NA,0,1,0,NA,0,NA)
b <- c(1,0,NA,1,0,1,1,1,NA,0,1)
c <- c(1,NA,NA,0,NA,0,1,1,1,0,0)
mydata <- data.frame(a,b,c)
Reading functions is best done from inside out. This one will take a vector x
and see how many are not NA. When it sums (sum
) the TRUE/FALSE values it turns them beforehand to 1 and 0, respectively. It then performs a test if there are more than 1 (so 2 or more) values - that are not NA.
conditionalMean <- function(x) {
if (sum(!is.na(x)) > 1) {
mean(x, na.rm = TRUE)
} else {
NA
}
}
We apply this function to your data.frame
row-wise, as denoted by MARGIN = 1
. If you had a function that worked column-wise, you would use MARGIN = 2
. You can try it out. Compare apply(mydata, MARGIN = 2, FUN = mean, na.rm = TRUE)
and colMeans(mydata, na.rm = TRUE)
.
apply(mydata, MARGIN = 1, FUN = conditionalMean)
[1] 1.0000000 0.0000000 NA 0.6666667 NA 0.3333333 1.0000000
[8] 0.6666667 NA 0.0000000 0.5000000
Calculate row means on specific columns
You can use aggregate
aggregate(Reading~Sample,data=yourdata, mean)
Aggregate the mean 2 columns to become one column for each row
You can use apply
:
Data:
df <- data.frame(
Time = c(-200, -1.34, 0.536),
"1a" = c(-0.02, -0.003, 0.057),
"1b" = c(-0.006, -0.04, 0.0235)
)
Solution:
df$mean <- apply(df[-1], 1, mean)
Result:
df
Time X1a X1b mean
1 -200.000 -0.020 -0.0060 -0.01300
2 -1.340 -0.003 -0.0400 -0.02150
3 0.536 0.057 0.0235 0.04025
Alternatively, as suggested by @jay.sf, use rowMeans
, which is faster in terms of execution:
rowMeans(df[2:3])
[1] -0.01300 -0.02150 0.04025
Calculate row means on subset of columns selected via external rank
For the first question,
you could get the mean of the first 2 non NA values per row using apply
:
df$BestAvg = apply(df,1,function(x) mean(x[!is.na(x)][1:2]))
In the case that the ranking of coders is actually CoderD > CoderB > CoderC > CoderA
:
r = c("CoderD", "CoderB", "CoderC", "CoderA")
df$BestAvg2 = apply(df,1,function(x) mean(x[r][!is.na(x[r])][1:2]))
This returns:
CoderA CoderB CoderC CoderD BestAvg BestAvg2
1 2 1 NA 1 1.5 1.0
2 1 3 3 NA 2.0 3.0
3 NA NA 4 5 4.5 4.5
4 7 6 7 6 6.5 6.0
5 3 3 4 2 3.0 2.5
6 2 2 NA NA 2.0 2.0
7 2 NA 2 1 2.0 1.5
8 5 3 NA 4 4.0 3.5
9 7 7 6 NA 7.0 6.5
10 1 NA 3 4 2.0 3.5
Related Topics
Delete Rows With Negative Values
Break Dataframe into Smaller Dataframe'S and Save Them
Duplicate Columns in Spark Dataframe
How to Write Ifelse Statement With Multiple Conditions in R
Apply Several Summary Functions on Several Variables by Group in One Call
Strptime, As.Posixct and As.Date Return Unexpected Na
Installing Older Version of R Package
How to Change Legend Title in Ggplot
How to Remove Na from a Factor Variable (And from a Ggplot Chart)
How to Change the Default Colors in Plotly Chart
Add Legend to Geom_Line() Graph in R
How to Change Y Axis Limits in Decimal Points in R
Plotting Two Variables as Lines Using Ggplot2 on the Same Graph
Pass a Data.Frame Column Name to a Function
Plot Two Graphs in Same Plot in R
Selecting Data Frame Rows Based on Partial String Match in a Column