Calculate mean by group using dplyr package
The reason could be that we accidentally loaded the plyr
library. There is a summarise
in that package as well
diamonds %>%
group_by(cut) %>%
dplyr::summarize(Mean = mean(price, na.rm=TRUE))
# A tibble: 5 x 2
# cut Mean
# <ord> <dbl>
#1 Fair 4358.758
#2 Good 3928.864
#3 Very Good 3981.760
#4 Premium 4584.258
#5 Ideal 3457.542
If we use the plyr::summarise
diamonds %>%
group_by(cut) %>%
plyr::summarize(Mean = mean(price, na.rm=TRUE))
# Mean
#1 3932.8
Calculate mean by group with dplyr
You were almost there:
Data %>%
group_by(CodeProject) %>%
summarise(
n = n(),
mean_pr = mean(Price, na.rm=T))
## A tibble: 2 x 3
# CodeProject n mean_pr
# <fct> <int> <dbl>
#1 Pr1 3 4.00
#2 Pr2 2 7.50
Calculating mean by group using dplyr in R
We can use
library(dplyr)
df <- df %>%
group_by(class) %>%
mutate(Mean = mean(x)) %>%
ungroup
-ouptut
df
# A tibble: 6 x 3
x class Mean
<dbl> <dbl> <dbl>
1 2.43 1 1.05
2 0.0625 1 1.05
3 0.669 1 1.05
4 0.195 2 -0.0550
5 0.285 2 -0.0550
6 -0.644 2 -0.0550
data
df <- data.frame(x, class)
How to calculate mean by row for multiple groups using dplyr in R?
We may use %in%
or ==
to subset the 'Value' based on the 'Distance' values (assuming the precision is correct) after grouping by 'Age', 'Location'
library(dplyr)
df1 %>%
group_by(Age, Location) %>%
summarise(Mean_0.5 = mean(Value[Distance == 0.5]),
Mean_1.5_and_2.5 = mean(Value[Distance %in% c(1.5, 2.5)]),
.groups = 'drop')
-output
# A tibble: 4 × 4
Age Location Mean_0.5 Mean_1.5_and_2.5
<dbl> <chr> <dbl> <dbl>
1 1 Central 206. 202.
2 1 North 210. 201.
3 2 Central 193 186.
4 2 North 202. 214.
Mean per group in a data.frame
This type of operation is exactly what aggregate
was designed for:
d <- read.table(text=
'Name Month Rate1 Rate2
Aira 1 12 23
Aira 2 18 73
Aira 3 19 45
Ben 1 53 19
Ben 2 22 87
Ben 3 19 45
Cat 1 22 87
Cat 2 67 43
Cat 3 45 32', header=TRUE)
aggregate(d[, 3:4], list(d$Name), mean)
Group.1 Rate1 Rate2
1 Aira 16.33333 47.00000
2 Ben 31.33333 50.33333
3 Cat 44.66667 54.00000
Here we aggregate columns 3 and 4 of data.frame d
, grouping by d$Name
, and applying the mean
function.
Or, using a formula interface:
aggregate(. ~ Name, d[-2], mean)
calculate a weighted mean by group with dplyr (and replicate other approaches)
This is very common thing that happens when package plyr
is loaded because plyr::summarise
can override dplyr::summarise
function. Just use dplyr::summarise
. It's the first thing to check if summarise
outputs unexpected results.
Another way is to detach the plyr
package before using dplyr
:
detach("package:plyr")
library("dplyr")
df %>% group_by(B) %>%
summarise(wm = weighted.mean(A, P))
# B wm
# <dbl> <dbl>
# 1 10 1.6
# 2 20 1.8
Calculate a mean by groups in R
The idea is to change the format of the data from wide format into long format and then group the data and summarize it as follows;
library(dplyr)
library(tidyr)
homicide_ratios <-
data.frame(
Mainland = c("Europe", "Asia", "Oceania", "Americas", "Africa"),
"1990" = c(1, 2, 3, 4, 5),
"1991" = c(1, 2, 3, 4, 5),
"1992" = c(1, 2, 3, 4, 5),
"1993" = c(1, 2, 3, 4, 5)
)
homicide_ratios %>%
gather(key = "year", value = "rate", -Mainland) %>%
group_by(Mainland, year) %>%
summarize(average = mean(rate))
# # A tibble: 20 x 3
# # Groups: Mainland [5]
# Mainland year average
# <fct> <chr> <dbl>
# Africa X1990 5
# Africa X1991 5
# Africa X1992 5
# Africa X1993 5
# Americas X1990 4
# Americas X1991 4
# Americas X1992 4
How to calculate mean of all columns, by group?
Edit2: Recent version of dplyr
suggests using regular summarise
with across
function, as in:
library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise(across(everything(), mean))
What you're looking for is either ?summarise_all
or ?summarise_each
from dplyr
Edit: full code:
library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise_all("mean")
# Source: local data frame [8 x 11]
# Groups: cyl [?]
#
# cyl gear mpg disp hp drat wt qsec vs am carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 4 3 21.500 120.1000 97.0000 3.700000 2.465000 20.0100 1.0 0.00 1.000000
# 2 4 4 26.925 102.6250 76.0000 4.110000 2.378125 19.6125 1.0 0.75 1.500000
# 3 4 5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000 0.5 1.00 2.000000
# 4 6 3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300 1.0 0.00 1.000000
# 5 6 4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700 0.5 0.50 4.000000
# 6 6 5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000 0.0 1.00 6.000000
# 7 8 3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425 0.0 0.00 3.083333
# 8 8 5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500 0.0 1.00 6.000000
Related Topics
Using R to Fit a Sigmoidal Curve
Grouping with Custom Geom Fails - How to Inspect Internal Object from Draw_Panel()
Handling Latex Backslashes in Xtable
R - How to Add Row Index to a Data Frame, Based on Combination of Factors
How to Change Color of Facet Borders When Using Facet_Grid
Beginner Tips on Using Plyr to Calculate Year-Over-Year Change Across Groups
Multiply Columns in a Data Frame by a Vector
Rcharts with Highcharts as Shiny Application
Logistic Regression with Robust Clustered Standard Errors in R
Adding Regression Line Equation and R2 on Separate Lines Graph
How to Save Output from Ggforce::Facet_Grid_Paginate in Only One PDF
Delete Rows Based on Multiple Conditions with Dplyr
Object.Size() Reports Smaller Size Than .Rdata File
Use Href Infobox as Actionbutton
Convert Begin and End Coordinates into Spatial Lines in R
Collapse Consecutive Runs of Numbers to a String of Ranges
How to Programmatically Darken the Color Given Rgb Values
R: Save Multiple Plots from a File List into a Single File (Png or PDF or Other Format)