Total Mean & Mean by Groups in R with Dplyr

Total Mean & Mean by groups in R with dplyr

Try this:

df %>% 
mutate(avg=mean(speed)) %>%
group_by(dive) %>%
summarise(Avg_group=mean(speed),Total_Mean=first(avg))

Calculate mean by group using dplyr package

The reason could be that we accidentally loaded the plyr library. There is a summarise in that package as well

diamonds %>%
group_by(cut) %>%
dplyr::summarize(Mean = mean(price, na.rm=TRUE))
# A tibble: 5 x 2
# cut Mean
# <ord> <dbl>
#1 Fair 4358.758
#2 Good 3928.864
#3 Very Good 3981.760
#4 Premium 4584.258
#5 Ideal 3457.542

If we use the plyr::summarise

diamonds %>% 
group_by(cut) %>%
plyr::summarize(Mean = mean(price, na.rm=TRUE))
# Mean
#1 3932.8

How to use group_by with mean and sum in dplyr?

If I understood correctly, this might help you

#Libraries

library(tidyverse)
library(lubridate)

#Data

df <-
tibble::tribble(
~Year, ~School.Name, ~Student.Score1, ~Student.Score2,
2019L, "ISD 1", 1L, NA,
2020L, "ISD 4", 4L, 2L,
2020L, "ISD 3", NA, 3L,
2018L, "ISD 1", 4L, NA,
2019L, "ISD 4", 2L, 5L,
2020L, "ISD 4", 3L, 2L,
2019L, "ISD 3", NA, 1L,
2018L, "ISD 1", 2L, 4L
)

#How to

df %>%
group_by(Year,School.Name) %>%
summarise(
n = n(),
across(.cols = contains(".Score"),.fns = function(x)mean(x,na.rm = TRUE))
)

# A tibble: 6 x 5
# Groups: Year [3]
Year School.Name n Student.Score1 Student.Score2
<int> <chr> <int> <dbl> <dbl>
1 2018 ISD 1 2 3 4
2 2019 ISD 1 1 1 NaN
3 2019 ISD 3 1 NaN 1
4 2019 ISD 4 1 2 5
5 2020 ISD 3 1 NaN 3
6 2020 ISD 4 2 3.5 2

Calculate Group Mean and Overall Mean

here is one more dplyr solution

index <- as.data.frame(Data %>% 
group_by(Y) %>%
summarise_all(mean) %>%
select(-Y) %>%
rbind(Data %>% select(-Y) %>% summarise_all(mean))%>%
mutate_all(funs( . / .[3])))[1:2,]

Mean per group in a data.frame

This type of operation is exactly what aggregate was designed for:

d <- read.table(text=
'Name Month Rate1 Rate2
Aira 1 12 23
Aira 2 18 73
Aira 3 19 45
Ben 1 53 19
Ben 2 22 87
Ben 3 19 45
Cat 1 22 87
Cat 2 67 43
Cat 3 45 32', header=TRUE)

aggregate(d[, 3:4], list(d$Name), mean)

Group.1 Rate1 Rate2
1 Aira 16.33333 47.00000
2 Ben 31.33333 50.33333
3 Cat 44.66667 54.00000

Here we aggregate columns 3 and 4 of data.frame d, grouping by d$Name, and applying the mean function.


Or, using a formula interface:

aggregate(. ~ Name, d[-2], mean)

Calculate a mean by groups in R

The idea is to change the format of the data from wide format into long format and then group the data and summarize it as follows;

library(dplyr)
library(tidyr)

homicide_ratios <-
data.frame(
Mainland = c("Europe", "Asia", "Oceania", "Americas", "Africa"),
"1990" = c(1, 2, 3, 4, 5),
"1991" = c(1, 2, 3, 4, 5),
"1992" = c(1, 2, 3, 4, 5),
"1993" = c(1, 2, 3, 4, 5)
)

homicide_ratios %>%
gather(key = "year", value = "rate", -Mainland) %>%
group_by(Mainland, year) %>%
summarize(average = mean(rate))

# # A tibble: 20 x 3
# # Groups: Mainland [5]
# Mainland year average
# <fct> <chr> <dbl>
# Africa X1990 5
# Africa X1991 5
# Africa X1992 5
# Africa X1993 5
# Americas X1990 4
# Americas X1991 4
# Americas X1992 4

dplyr: mean of a group count

We can use dplyr methods

library(dplyr)
x %>%
group_by(`% Bucket`) %>%
summarise(count= mean(count))

How to calculate mean of all columns, by group?

Edit2: Recent version of dplyr suggests using regular summarise with across function, as in:

library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise(across(everything(), mean))

What you're looking for is either ?summarise_all or ?summarise_each from dplyr

Edit: full code:

library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise_all("mean")

# Source: local data frame [8 x 11]
# Groups: cyl [?]
#
# cyl gear mpg disp hp drat wt qsec vs am carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 4 3 21.500 120.1000 97.0000 3.700000 2.465000 20.0100 1.0 0.00 1.000000
# 2 4 4 26.925 102.6250 76.0000 4.110000 2.378125 19.6125 1.0 0.75 1.500000
# 3 4 5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000 0.5 1.00 2.000000
# 4 6 3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300 1.0 0.00 1.000000
# 5 6 4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700 0.5 0.50 4.000000
# 6 6 5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000 0.0 1.00 6.000000
# 7 8 3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425 0.0 0.00 3.083333
# 8 8 5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500 0.0 1.00 6.000000

Using R & dplyr to summarize - group_by, count, mean, sd

Even though answered via comments, I felt such a nice reproducible example for a very first question deserved an official answer.

library(dplyr)
set.seed(123)
var1 <- rnorm(15, mean=2, sd=1)
var2 <- c(rep("A", 5), rep("B", 5), rep("C", 5))
df <- data.frame(var1, var2)
df_stat <- df %>% group_by(var2) %>% summarize(
count = n(),
mean = mean(var1, na.rm = TRUE),
sd = sd(var1, na.rm = TRUE))
head(df_stat)
# A tibble: 3 x 4
# var2 count mean sd
# <fct> <int> <dbl> <dbl>
# 1 A 5 2.19 0.811
# 2 B 5 1.96 1.16
# 3 C 5 2.31 0.639


Related Topics



Leave a reply



Submit