dplyr summarise_each with na.rm
Following the links in the doc, it seems you can use funs(mean(., na.rm = TRUE))
:
library(dplyr)
by_species <- iris %>% group_by(Species)
by_species %>% summarise_each(funs(mean(., na.rm = TRUE)))
Saving na.rm=TRUE for each function in dplyr
You should use summarise_at
, which lets you compute multiple functions for the supplied columns and set arguments that are shared among them:
df %>% group_by(group) %>%
summarise_at("value",
funs(mean = mean, sd = sd, min = min),
na.rm = TRUE)
How to Use na.rm=TRUE with n() While Using Dplyr's Group_by and Summarise_at
I think your code was very close to getting the job done. I made some slight changes and have included an example of how you might include the percent calculation in the same step (although I am not sure of your expected output).
library(dplyr)
Df %>%
group_by(Group) %>%
summarise_all(funs(count = sum(!is.na(.)),
sum = sum(.,na.rm=TRUE),
pct = sum(.,na.rm=TRUE)/sum(!is.na(.))))
#> # A tibble: 2 x 10
#> Group Var1_count Var2_count Var3_count Var1_sum Var2_sum Var3_sum
#> <fctr> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 Condo 2 2 2 1 2 1
#> 2 House 5 6 4 4 5 4
#> # ... with 3 more variables: Var1_pct <dbl>, Var2_pct <dbl>,
#> # Var3_pct <dbl>
I've also used summarise_all
instead of summarise_at
as summarise_all
works on all the variables which aren't group
variables.
Problem using na.rm=TRUE in summarize in R code
If we want to find the mode, use Mode
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
and now it should work
Test%>%
group_by(Week = tools::toTitleCase(Week)) %>%
summarize(Mode=Mode(time),.groups = 'drop')
# A tibble: 2 × 2
Week Mode
<chr> <dbl>
1 Thursday 0
2 Wednesday 5
If we want to insert the na.rm
, it should be an argument to the function and the max
should also have that argument
Test1 <- function(t, rm_na) {
s <- table(as.vector(t))
names(s)[s %in% max(s, na.rm = rm_na)]
}
and use the function as
Test %>%
group_by(Week = tools::toTitleCase(Week)) %>%
summarize(Mode=Test1(time, TRUE),.groups = 'drop')
How can I in R, group by ID and summarise by mean with na.rm = TRUE
Use the lambda (~
library(dplyr)
ID_x %>%
group_by(ID) %>%
summarise_each(~ mean(., na.rm=TRUE))
-output
# A tibble: 3 × 2
ID x
<dbl> <dbl>
1 1 2.5
2 2 2.5
3 3 1
Also, in recent versions, the summarise_each
will accompany a warning as these are deprecated in favor of across
ID_x %>%
group_by(ID) %>%
summarise(across(everything(), ~ mean(., na.rm=TRUE)))
Remove NAs in function list for dplyr's across
One option could be:
iris %>%
group_by(Species) %>%
summarise(across(c(Sepal.Length:Petal.Width),
list(mean = ~ mean(., na.rm = TRUE), sd = ~ sd(., na.rm = TRUE))))
Summarise_each for first non-NA value
You can use first(na.omit(.))
or na.omit(.)[1]
. Besides summarise_each
is deprecated, use summarise_all
instead.
Using dplyr summarise_each() with is.na()
Here's a possibility, tested on a small data set with some NA
:
df <- data.frame(a = rep(1:2, each = 3),
b = c(1, 1, NA, 1, NA, NA),
c = c(1, 1, 1, NA, NA, NA))
df
# a b c
# 1 1 1 1
# 2 1 1 1
# 3 1 NA 1
# 4 2 1 NA
# 5 2 NA NA
# 6 2 NA NA
df %>%
group_by(a) %>%
summarise_each(funs(sum(is.na(.)) / length(.)))
# a b c
# 1 1 0.3333333 0
# 2 2 0.6666667 1
And because you asked for pointers to documentation: The .
refers to each piece of the data, and is used in some Examples in ?summarize_each
. It is described in the Arguments section of ?funs
as a "dummy parameter" , and is used the Examples. The .
is also briefly described in the Arguments section of ?do
: "...
You can use .
to refer to the current group"
Related Topics
Error: --With-Readline=Yes (Default) and Headers/Libs Are Not Available
Understanding the Differences Between Mclapply and Parlapply in R
Save All Plots Already Present in the Panel of Rstudio
How to Control the Igraph Plot Layout with Fixed Positions
Error: Zipping Up Workbook Failed When Trying to Write.Xlsx
Writing to Specific Schemas with Rpostgresql
Creating Vector of Results of Repeated Function Calls in R
Fastest Way for Multiplying a Matrix to a Vector
Specifying Xlim and Ylim When Using Log-Scale in R
Relationship Between R Markdown, Knitr, Pandoc, and Bookdown
Writings Functions (Procedures) for Data.Table Objects
Comparison Between Dplyr::Do/Purrr::Map, What Advantages
How to Separate Title Page and Table of Content Page from Knitr Rmarkdown PDF
Updating Column in One Dataframe with Value from Another Dataframe Based on Matching Values
Unnesting a List of Lists in a Data Frame Column
How to Knitr Markdown Straight Out of Your Workspace Using Rstudio
How to Optimize Read and Write to Subsections of a Matrix in R (Possibly Using Data.Table)