R: How to calculate mean for each row with missing values using dplyr
df %>%
mutate(means=rowMeans(., na.rm=TRUE))
The .
is a "pronoun" that references the data frame df
that was piped into mutate
.
A B C means
1 3 0 9 4.000000
2 4 6 NA 5.000000
3 5 8 1 4.666667
You can also select only specific columns to include, using all the usual methods (column names, indices, grep
, etc.).
df %>%
mutate(means=rowMeans(.[ , c("A","C")], na.rm=TRUE))
A B C means
1 3 0 9 6
2 4 6 NA 4
3 5 8 1 3
Calculating mean with NA value present in a data.frame using R
You need to use na.rm = TRUE
:
df2<-df1%>%
group_by(st, date)%>%
summarise(ph=mean(ph, na.rm = TRUE))
df2
# A tibble: 3 x 3
# Groups: st [3]
st date ph
<int> <chr> <dbl>
1 1 01/02/2004 5
2 2 01/02/2004 8
3 16 01/02/2004 6
How to calculate means when you have missing values?
To match mean
with excel you can repeat the time
value df
number of times.
mean(rep(df$time, df$df))
#[1] 17.85714
Average across Columns in R, excluding NAs
You want rowMeans()
but importantly note it has a na.rm
argument that you want to set to TRUE
. E.g.:
> mat <- matrix(c(23,2,NA,NA,2,9,23,2,9), ncol = 3)
> mat
[,1] [,2] [,3]
[1,] 23 NA 23
[2,] 2 2 2
[3,] NA 9 9
> rowMeans(mat)
[1] NA 2 NA
> rowMeans(mat, na.rm = TRUE)
[1] 23 2 9
To match your example:
> dat <- data.frame(Trait = c("DF","DG","DH"), mat)
> names(dat) <- c("Trait", paste0("Col", 1:3))
> dat
Trait Col1 Col2 Col3
1 DF 23 NA 23
2 DG 2 2 2
3 DH NA 9 9
> dat <- transform(dat, Col4 = rowMeans(dat[,-1], na.rm = TRUE))
> dat
Trait Col1 Col2 Col3 Col4
1 DF 23 NA 23 23
2 DG 2 2 2 2
3 DH NA 9 9 9
How to calculate mean value for each column ignoring NA
For a data.table
dt
, that looks like this:
dt
Var1 Var2 Var3 Var4 Var12
1: 1 NA 2 3 4
2: 5 6 2 3 3
3: NA 7 8 NA 4
You can simply use lapply()
:
dt[, lapply(.SD, mean, na.rm = TRUE)]
The result is:
Var1 Var2 Var3 Var4 Var12
1: 3 6.5 4 3 3.666667
Calculate mean of each row in a large list of dataframes in R
We may bind the list
elements to a single data and then use a group by mean
operation
library(dplyr)
bind_rows(lst1) %>%
group_by(id) %>%
summarise(value_mean = mean(value, na.rm = TRUE), .groups = 'drop')
-output
# A tibble: 3 x 2
id value_mean
<chr> <dbl>
1 id1 0.25
2 id2 0.25
3 id3 0.5
If the datasets have a the same dimension and the 'id' are in same order, extract the 'value' column, use Reduce
to do elementwise +
and divide by the length
of list
Reduce(`+`, lapply(lst1, `[[`, "value"))/length(lst1)
[1] 0.25 0.25 0.50
Or a more efficient approach is with dapply/t_list
from collapse
library(collapse)
dapply(t_list(dapply(lst1, `[[`, "value")), fmean)
V1 V2 V3
0.25 0.25 0.50
Related Topics
Create a Formula in a Data.Table Environment in R
Using Lapply to Change Column Names of a List of Data Frames
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
How Make 2 Column Layout in R Markdown When Rendering PDF
Include Data Examples in Developing R Packages
Could Not Find Function Inside Foreach Loop
How to Get Factor Matrices in R
Ggplot: Multiple Years on Same Plot by Month
Check If Each Row of a Data Frame Is Contained in Another Data Frame
Highlight (Shade) Plot Background in Specific Time Range
How to Make a Dummy Variable in R
How to Make PDF Download in Shiny App Response to User Inputs
Colorize Parts of the Title in a Plot
Setting the Color for an Individual Data Point
Get All the Rows with Rownames Starting with Abc111
How to Find Index of Match Between Two Set of Data Frame
How to Make a Ggplot2 Contour Plot Analogue to Lattice:Filled.Contour()