Calculate Mean by Group Using Dplyr Package

Calculate mean by group using dplyr package

The reason could be that we accidentally loaded the plyr library. There is a summarise in that package as well

diamonds %>%
group_by(cut) %>%
dplyr::summarize(Mean = mean(price, na.rm=TRUE))
# A tibble: 5 x 2
# cut Mean
# <ord> <dbl>
#1 Fair 4358.758
#2 Good 3928.864
#3 Very Good 3981.760
#4 Premium 4584.258
#5 Ideal 3457.542

If we use the plyr::summarise

diamonds %>% 
group_by(cut) %>%
plyr::summarize(Mean = mean(price, na.rm=TRUE))
# Mean
#1 3932.8

Calculate mean by group with dplyr

You were almost there:

Data  %>%
group_by(CodeProject) %>%
summarise(
n = n(),
mean_pr = mean(Price, na.rm=T))
## A tibble: 2 x 3
# CodeProject n mean_pr
# <fct> <int> <dbl>
#1 Pr1 3 4.00
#2 Pr2 2 7.50

Calculating mean by group using dplyr in R

We can use

library(dplyr)
df <- df %>%
group_by(class) %>%
mutate(Mean = mean(x)) %>%
ungroup

-ouptut

df
# A tibble: 6 x 3
x class Mean
<dbl> <dbl> <dbl>
1 2.43 1 1.05
2 0.0625 1 1.05
3 0.669 1 1.05
4 0.195 2 -0.0550
5 0.285 2 -0.0550
6 -0.644 2 -0.0550

data

df <- data.frame(x, class)

How to calculate mean by row for multiple groups using dplyr in R?

We may use %in% or == to subset the 'Value' based on the 'Distance' values (assuming the precision is correct) after grouping by 'Age', 'Location'

library(dplyr)
df1 %>%
group_by(Age, Location) %>%
summarise(Mean_0.5 = mean(Value[Distance == 0.5]),
Mean_1.5_and_2.5 = mean(Value[Distance %in% c(1.5, 2.5)]),
.groups = 'drop')

-output

# A tibble: 4 × 4
Age Location Mean_0.5 Mean_1.5_and_2.5
<dbl> <chr> <dbl> <dbl>
1 1 Central 206. 202.
2 1 North 210. 201.
3 2 Central 193 186.
4 2 North 202. 214.

Mean per group in a data.frame

This type of operation is exactly what aggregate was designed for:

d <- read.table(text=
'Name Month Rate1 Rate2
Aira 1 12 23
Aira 2 18 73
Aira 3 19 45
Ben 1 53 19
Ben 2 22 87
Ben 3 19 45
Cat 1 22 87
Cat 2 67 43
Cat 3 45 32', header=TRUE)

aggregate(d[, 3:4], list(d$Name), mean)

Group.1 Rate1 Rate2
1 Aira 16.33333 47.00000
2 Ben 31.33333 50.33333
3 Cat 44.66667 54.00000

Here we aggregate columns 3 and 4 of data.frame d, grouping by d$Name, and applying the mean function.


Or, using a formula interface:

aggregate(. ~ Name, d[-2], mean)

calculate a weighted mean by group with dplyr (and replicate other approaches)

This is very common thing that happens when package plyr is loaded because plyr::summarise can override dplyr::summarise function. Just use dplyr::summarise. It's the first thing to check if summarise outputs unexpected results.

Another way is to detach the plyr package before using dplyr:

detach("package:plyr")
library("dplyr")
df %>% group_by(B) %>%
summarise(wm = weighted.mean(A, P))
# B wm
# <dbl> <dbl>
# 1 10 1.6
# 2 20 1.8

Calculate a mean by groups in R

The idea is to change the format of the data from wide format into long format and then group the data and summarize it as follows;

library(dplyr)
library(tidyr)

homicide_ratios <-
data.frame(
Mainland = c("Europe", "Asia", "Oceania", "Americas", "Africa"),
"1990" = c(1, 2, 3, 4, 5),
"1991" = c(1, 2, 3, 4, 5),
"1992" = c(1, 2, 3, 4, 5),
"1993" = c(1, 2, 3, 4, 5)
)

homicide_ratios %>%
gather(key = "year", value = "rate", -Mainland) %>%
group_by(Mainland, year) %>%
summarize(average = mean(rate))

# # A tibble: 20 x 3
# # Groups: Mainland [5]
# Mainland year average
# <fct> <chr> <dbl>
# Africa X1990 5
# Africa X1991 5
# Africa X1992 5
# Africa X1993 5
# Americas X1990 4
# Americas X1991 4
# Americas X1992 4

How to calculate mean of all columns, by group?

Edit2: Recent version of dplyr suggests using regular summarise with across function, as in:

library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise(across(everything(), mean))

What you're looking for is either ?summarise_all or ?summarise_each from dplyr

Edit: full code:

library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise_all("mean")

# Source: local data frame [8 x 11]
# Groups: cyl [?]
#
# cyl gear mpg disp hp drat wt qsec vs am carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 4 3 21.500 120.1000 97.0000 3.700000 2.465000 20.0100 1.0 0.00 1.000000
# 2 4 4 26.925 102.6250 76.0000 4.110000 2.378125 19.6125 1.0 0.75 1.500000
# 3 4 5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000 0.5 1.00 2.000000
# 4 6 3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300 1.0 0.00 1.000000
# 5 6 4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700 0.5 0.50 4.000000
# 6 6 5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000 0.0 1.00 6.000000
# 7 8 3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425 0.0 0.00 3.083333
# 8 8 5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500 0.0 1.00 6.000000


Related Topics



Leave a reply



Submit