Total Mean & Mean by groups in R with dplyr
Try this:
df %>%
mutate(avg=mean(speed)) %>%
group_by(dive) %>%
summarise(Avg_group=mean(speed),Total_Mean=first(avg))
Calculate mean by group using dplyr package
The reason could be that we accidentally loaded the plyr
library. There is a summarise
in that package as well
diamonds %>%
group_by(cut) %>%
dplyr::summarize(Mean = mean(price, na.rm=TRUE))
# A tibble: 5 x 2
# cut Mean
# <ord> <dbl>
#1 Fair 4358.758
#2 Good 3928.864
#3 Very Good 3981.760
#4 Premium 4584.258
#5 Ideal 3457.542
If we use the plyr::summarise
diamonds %>%
group_by(cut) %>%
plyr::summarize(Mean = mean(price, na.rm=TRUE))
# Mean
#1 3932.8
How to use group_by with mean and sum in dplyr?
If I understood correctly, this might help you
#Libraries
library(tidyverse)
library(lubridate)
#Data
df <-
tibble::tribble(
~Year, ~School.Name, ~Student.Score1, ~Student.Score2,
2019L, "ISD 1", 1L, NA,
2020L, "ISD 4", 4L, 2L,
2020L, "ISD 3", NA, 3L,
2018L, "ISD 1", 4L, NA,
2019L, "ISD 4", 2L, 5L,
2020L, "ISD 4", 3L, 2L,
2019L, "ISD 3", NA, 1L,
2018L, "ISD 1", 2L, 4L
)
#How to
df %>%
group_by(Year,School.Name) %>%
summarise(
n = n(),
across(.cols = contains(".Score"),.fns = function(x)mean(x,na.rm = TRUE))
)
# A tibble: 6 x 5
# Groups: Year [3]
Year School.Name n Student.Score1 Student.Score2
<int> <chr> <int> <dbl> <dbl>
1 2018 ISD 1 2 3 4
2 2019 ISD 1 1 1 NaN
3 2019 ISD 3 1 NaN 1
4 2019 ISD 4 1 2 5
5 2020 ISD 3 1 NaN 3
6 2020 ISD 4 2 3.5 2
Calculate Group Mean and Overall Mean
here is one more dplyr solution
index <- as.data.frame(Data %>%
group_by(Y) %>%
summarise_all(mean) %>%
select(-Y) %>%
rbind(Data %>% select(-Y) %>% summarise_all(mean))%>%
mutate_all(funs( . / .[3])))[1:2,]
Mean per group in a data.frame
This type of operation is exactly what aggregate
was designed for:
d <- read.table(text=
'Name Month Rate1 Rate2
Aira 1 12 23
Aira 2 18 73
Aira 3 19 45
Ben 1 53 19
Ben 2 22 87
Ben 3 19 45
Cat 1 22 87
Cat 2 67 43
Cat 3 45 32', header=TRUE)
aggregate(d[, 3:4], list(d$Name), mean)
Group.1 Rate1 Rate2
1 Aira 16.33333 47.00000
2 Ben 31.33333 50.33333
3 Cat 44.66667 54.00000
Here we aggregate columns 3 and 4 of data.frame d
, grouping by d$Name
, and applying the mean
function.
Or, using a formula interface:
aggregate(. ~ Name, d[-2], mean)
Calculate a mean by groups in R
The idea is to change the format of the data from wide format into long format and then group the data and summarize it as follows;
library(dplyr)
library(tidyr)
homicide_ratios <-
data.frame(
Mainland = c("Europe", "Asia", "Oceania", "Americas", "Africa"),
"1990" = c(1, 2, 3, 4, 5),
"1991" = c(1, 2, 3, 4, 5),
"1992" = c(1, 2, 3, 4, 5),
"1993" = c(1, 2, 3, 4, 5)
)
homicide_ratios %>%
gather(key = "year", value = "rate", -Mainland) %>%
group_by(Mainland, year) %>%
summarize(average = mean(rate))
# # A tibble: 20 x 3
# # Groups: Mainland [5]
# Mainland year average
# <fct> <chr> <dbl>
# Africa X1990 5
# Africa X1991 5
# Africa X1992 5
# Africa X1993 5
# Americas X1990 4
# Americas X1991 4
# Americas X1992 4
dplyr: mean of a group count
We can use dplyr
methods
library(dplyr)
x %>%
group_by(`% Bucket`) %>%
summarise(count= mean(count))
How to calculate mean of all columns, by group?
Edit2: Recent version of dplyr
suggests using regular summarise
with across
function, as in:
library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise(across(everything(), mean))
What you're looking for is either ?summarise_all
or ?summarise_each
from dplyr
Edit: full code:
library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise_all("mean")
# Source: local data frame [8 x 11]
# Groups: cyl [?]
#
# cyl gear mpg disp hp drat wt qsec vs am carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 4 3 21.500 120.1000 97.0000 3.700000 2.465000 20.0100 1.0 0.00 1.000000
# 2 4 4 26.925 102.6250 76.0000 4.110000 2.378125 19.6125 1.0 0.75 1.500000
# 3 4 5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000 0.5 1.00 2.000000
# 4 6 3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300 1.0 0.00 1.000000
# 5 6 4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700 0.5 0.50 4.000000
# 6 6 5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000 0.0 1.00 6.000000
# 7 8 3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425 0.0 0.00 3.083333
# 8 8 5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500 0.0 1.00 6.000000
Using R & dplyr to summarize - group_by, count, mean, sd
Even though answered via comments, I felt such a nice reproducible example for a very first question deserved an official answer.
library(dplyr)
set.seed(123)
var1 <- rnorm(15, mean=2, sd=1)
var2 <- c(rep("A", 5), rep("B", 5), rep("C", 5))
df <- data.frame(var1, var2)
df_stat <- df %>% group_by(var2) %>% summarize(
count = n(),
mean = mean(var1, na.rm = TRUE),
sd = sd(var1, na.rm = TRUE))
head(df_stat)
# A tibble: 3 x 4
# var2 count mean sd
# <fct> <int> <dbl> <dbl>
# 1 A 5 2.19 0.811
# 2 B 5 1.96 1.16
# 3 C 5 2.31 0.639
Related Topics
How to Load Comma Separated Data into R
What Is the Equivalent of Mutate_At (Dplyr) in Data.Table
Empty Output When Reading a CSV File into Rstudio Using Sparkr
Plot.Lm Error: $ Operator Is Invalid for Atomic Vectors
Non-Standard Evaluation and Quasiquotation in Dplyr() Not Working as (Naively) Expected
Automatically Generate New Variable Names Using Dplyr Mutate
Obtain Function from Akima::Interp() Matrix
What Does the %<>% Operator Mean in R
R: How to Match/Join 2 Matrices of Different Dimensions (Nrow/Ncol)
How to Store Filter Expressions as Strings
Filtering Multiple Columns with Str_Detect
How to Use R to Create a Word Co-Occurrence Matrix
Web Scraping a Tableauviz into an R Dataframe
How to Avoid Density Curve Getting Cut Off in Plot
Population Pyramid Plot with Ggplot2 and Dplyr (Instead of Plyr)
Cumsum Reset at Certain Values
Ordering Factors in Each Facet of Ggplot by Y-Axis Value
Do I Need to Reshape This Wide Data to Effectively Use Ggplot2