Dplyr Summarise_Each with Na.Rm

dplyr summarise_each with na.rm

Following the links in the doc, it seems you can use funs(mean(., na.rm = TRUE)):

library(dplyr)
by_species <- iris %>% group_by(Species)
by_species %>% summarise_each(funs(mean(., na.rm = TRUE)))

Saving na.rm=TRUE for each function in dplyr

You should use summarise_at, which lets you compute multiple functions for the supplied columns and set arguments that are shared among them:

df %>% group_by(group) %>% 
summarise_at("value",
funs(mean = mean, sd = sd, min = min),
na.rm = TRUE)

How to Use na.rm=TRUE with n() While Using Dplyr's Group_by and Summarise_at

I think your code was very close to getting the job done. I made some slight changes and have included an example of how you might include the percent calculation in the same step (although I am not sure of your expected output).


library(dplyr)
Df %>%
group_by(Group) %>%
summarise_all(funs(count = sum(!is.na(.)),
sum = sum(.,na.rm=TRUE),
pct = sum(.,na.rm=TRUE)/sum(!is.na(.))))

#> # A tibble: 2 x 10
#> Group Var1_count Var2_count Var3_count Var1_sum Var2_sum Var3_sum
#> <fctr> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 Condo 2 2 2 1 2 1
#> 2 House 5 6 4 4 5 4
#> # ... with 3 more variables: Var1_pct <dbl>, Var2_pct <dbl>,
#> # Var3_pct <dbl>

I've also used summarise_all instead of summarise_at as summarise_all works on all the variables which aren't group variables.

Problem using na.rm=TRUE in summarize in R code

If we want to find the mode, use Mode

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

and now it should work

Test%>%
group_by(Week = tools::toTitleCase(Week)) %>%
summarize(Mode=Mode(time),.groups = 'drop')
# A tibble: 2 × 2
Week Mode
<chr> <dbl>
1 Thursday 0
2 Wednesday 5

If we want to insert the na.rm, it should be an argument to the function and the max should also have that argument

Test1 <- function(t, rm_na) {
s <- table(as.vector(t))
names(s)[s %in% max(s, na.rm = rm_na)]
}

and use the function as

Test %>%
group_by(Week = tools::toTitleCase(Week)) %>%
summarize(Mode=Test1(time, TRUE),.groups = 'drop')

How can I in R, group by ID and summarise by mean with na.rm = TRUE

Use the lambda (~

library(dplyr)
ID_x %>%
group_by(ID) %>%
summarise_each(~ mean(., na.rm=TRUE))

-output

# A tibble: 3 × 2
ID x
<dbl> <dbl>
1 1 2.5
2 2 2.5
3 3 1

Also, in recent versions, the summarise_each will accompany a warning as these are deprecated in favor of across

ID_x %>%
group_by(ID) %>%
summarise(across(everything(), ~ mean(., na.rm=TRUE)))

Remove NAs in function list for dplyr's across

One option could be:

iris %>%
group_by(Species) %>%
summarise(across(c(Sepal.Length:Petal.Width),
list(mean = ~ mean(., na.rm = TRUE), sd = ~ sd(., na.rm = TRUE))))

Summarise_each for first non-NA value

You can use first(na.omit(.)) or na.omit(.)[1]. Besides summarise_each is deprecated, use summarise_all instead.

Using dplyr summarise_each() with is.na()

Here's a possibility, tested on a small data set with some NA:

df <- data.frame(a = rep(1:2, each = 3),
b = c(1, 1, NA, 1, NA, NA),
c = c(1, 1, 1, NA, NA, NA))

df
# a b c
# 1 1 1 1
# 2 1 1 1
# 3 1 NA 1
# 4 2 1 NA
# 5 2 NA NA
# 6 2 NA NA

df %>%
group_by(a) %>%
summarise_each(funs(sum(is.na(.)) / length(.)))
# a b c
# 1 1 0.3333333 0
# 2 2 0.6666667 1

And because you asked for pointers to documentation: The . refers to each piece of the data, and is used in some Examples in ?summarize_each. It is described in the Arguments section of ?funs as a "dummy parameter" , and is used the Examples. The . is also briefly described in the Arguments section of ?do: "... You can use . to refer to the current group"



Related Topics



Leave a reply



Submit