Calculate Mean Across Rows with Na Values in R

R: How to calculate mean for each row with missing values using dplyr

df %>% 
mutate(means=rowMeans(., na.rm=TRUE))

The . is a "pronoun" that references the data frame df that was piped into mutate.

  A B  C    means
1 3 0 9 4.000000
2 4 6 NA 5.000000
3 5 8 1 4.666667

You can also select only specific columns to include, using all the usual methods (column names, indices, grep, etc.).

df %>% 
mutate(means=rowMeans(.[ , c("A","C")], na.rm=TRUE))
  A B  C means
1 3 0 9 6
2 4 6 NA 4
3 5 8 1 3

Calculating mean with NA value present in a data.frame using R

You need to use na.rm = TRUE:

df2<-df1%>%
group_by(st, date)%>%
summarise(ph=mean(ph, na.rm = TRUE))

df2
# A tibble: 3 x 3
# Groups: st [3]
st date ph
<int> <chr> <dbl>
1 1 01/02/2004 5
2 2 01/02/2004 8
3 16 01/02/2004 6

How to calculate means when you have missing values?

To match mean with excel you can repeat the time value df number of times.

mean(rep(df$time, df$df))
#[1] 17.85714

Average across Columns in R, excluding NAs

You want rowMeans() but importantly note it has a na.rm argument that you want to set to TRUE. E.g.:

> mat <- matrix(c(23,2,NA,NA,2,9,23,2,9), ncol = 3)
> mat
[,1] [,2] [,3]
[1,] 23 NA 23
[2,] 2 2 2
[3,] NA 9 9
> rowMeans(mat)
[1] NA 2 NA
> rowMeans(mat, na.rm = TRUE)
[1] 23 2 9

To match your example:

> dat <- data.frame(Trait = c("DF","DG","DH"), mat)
> names(dat) <- c("Trait", paste0("Col", 1:3))
> dat
Trait Col1 Col2 Col3
1 DF 23 NA 23
2 DG 2 2 2
3 DH NA 9 9
> dat <- transform(dat, Col4 = rowMeans(dat[,-1], na.rm = TRUE))
> dat
Trait Col1 Col2 Col3 Col4
1 DF 23 NA 23 23
2 DG 2 2 2 2
3 DH NA 9 9 9

How to calculate mean value for each column ignoring NA

For a data.table dt, that looks like this:

dt
Var1 Var2 Var3 Var4 Var12
1: 1 NA 2 3 4
2: 5 6 2 3 3
3: NA 7 8 NA 4

You can simply use lapply():

dt[, lapply(.SD, mean, na.rm = TRUE)]

The result is:

   Var1 Var2 Var3 Var4    Var12
1: 3 6.5 4 3 3.666667

Calculate mean of each row in a large list of dataframes in R

We may bind the list elements to a single data and then use a group by mean operation

library(dplyr)
bind_rows(lst1) %>%
group_by(id) %>%
summarise(value_mean = mean(value, na.rm = TRUE), .groups = 'drop')

-output

# A tibble: 3 x 2
id value_mean
<chr> <dbl>
1 id1 0.25
2 id2 0.25
3 id3 0.5

If the datasets have a the same dimension and the 'id' are in same order, extract the 'value' column, use Reduce to do elementwise + and divide by the length of list

Reduce(`+`, lapply(lst1, `[[`, "value"))/length(lst1)
[1] 0.25 0.25 0.50

Or a more efficient approach is with dapply/t_list from collapse

library(collapse)
dapply(t_list(dapply(lst1, `[[`, "value")), fmean)
V1 V2 V3
0.25 0.25 0.50


Related Topics



Leave a reply



Submit