How to calculate mean of all columns, by group?
Edit2: Recent version of dplyr
suggests using regular summarise
with across
function, as in:
library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise(across(everything(), mean))
What you're looking for is either ?summarise_all
or ?summarise_each
from dplyr
Edit: full code:
library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise_all("mean")
# Source: local data frame [8 x 11]
# Groups: cyl [?]
#
# cyl gear mpg disp hp drat wt qsec vs am carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 4 3 21.500 120.1000 97.0000 3.700000 2.465000 20.0100 1.0 0.00 1.000000
# 2 4 4 26.925 102.6250 76.0000 4.110000 2.378125 19.6125 1.0 0.75 1.500000
# 3 4 5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000 0.5 1.00 2.000000
# 4 6 3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300 1.0 0.00 1.000000
# 5 6 4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700 0.5 0.50 4.000000
# 6 6 5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000 0.0 1.00 6.000000
# 7 8 3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425 0.0 0.00 3.083333
# 8 8 5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500 0.0 1.00 6.000000
How to average all columns in dataset by group
You can use summarise_all
instead of multiple uses of summarise
:
library(dplyr)
data %>%
group_by(ID) %>%
summarise_all(mean)
# A tibble: 3 x 4
ID Tr1 Tr2 Tr3
<int> <dbl> <dbl> <dbl>
1 1 4 4.33 8
2 4 3.5 3.5 6
3 6 3.67 5.33 6.33
Mean per group in a data.frame
This type of operation is exactly what aggregate
was designed for:
d <- read.table(text=
'Name Month Rate1 Rate2
Aira 1 12 23
Aira 2 18 73
Aira 3 19 45
Ben 1 53 19
Ben 2 22 87
Ben 3 19 45
Cat 1 22 87
Cat 2 67 43
Cat 3 45 32', header=TRUE)
aggregate(d[, 3:4], list(d$Name), mean)
Group.1 Rate1 Rate2
1 Aira 16.33333 47.00000
2 Ben 31.33333 50.33333
3 Cat 44.66667 54.00000
Here we aggregate columns 3 and 4 of data.frame d
, grouping by d$Name
, and applying the mean
function.
Or, using a formula interface:
aggregate(. ~ Name, d[-2], mean)
Group pandas dataframe and calculate mean for multiple columns
df.groupby("category", as_index=False).mean()
Group by columns under conditions to calculate average
Use DataFrame.pivot_table
with helper column new
by copy like ColB
, then flatten MultiIndex
and add ouput to new DataFrame created by aggregate sum
:
df1 = (df.assign(new=df['ColB'])
.pivot_table(index=['ColA', 'ColB'],
columns='new',
values=['interval','duration'],
fill_value=0,
aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
.sum()
.to_frame(name='SumCounter')
.join(df1).reset_index())
print (df)
ColA ColB SumCounter durationSD durationUD intervalSD intervalUD
0 A SD 3 2.5 0.0 3.5 0
1 A UD 10 0.0 2.0 0.0 1
2 B SD 32 2.0 0.0 3.5 0
3 B UD 4 0.0 1.5 0.0 2
How to calculate mean values grouped on another column in Pandas
You could groupby
on StationID
and then take mean()
on BiasTemp
. To output Dataframe
, use as_index=False
In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean()
Out[4]:
StationID BiasTemp
0 BB 5.0
1 KEOPS 2.5
2 SS0279 15.0
Without as_index=False
, it returns a Series
instead
In [5]: df.groupby('StationID')['BiasTemp'].mean()
Out[5]:
StationID
BB 5.0
KEOPS 2.5
SS0279 15.0
Name: BiasTemp, dtype: float64
Read more about groupby
in this pydata tutorial.
Means multiple columns by multiple groups
We can use dplyr
with summarise_at
to get mean
of the concerned columns after grouping by the column of interest
library(dplyr)
airquality %>%
group_by(City, year) %>%
summarise_at(vars("PM25", "Ozone", "CO2"), mean)
Or using the devel
version of dplyr
(version - ‘0.8.99.9000’
)
airquality %>%
group_by(City, year) %>%
summarise(across(PM25:CO2, mean))
Related Topics
How to Find Common Rows Between Two Dataframe in R
How to Have Na's Displayed First Using Arrange()
Element-Wise Concatenation of String Vectors
R Shiny, How to Make Datatable React to Checkboxes in Datatable
Best Way to Replace a Lengthy Ifelse Structure in R
How to Convert Certain Columns Only to Numeric
Subset Data Frame Using Row Names
Spacing Between Boxplots in Ggplot2
Dplyr Group by Colnames Described as Vector of Strings
Using R to Download Newest Files from Ftp-Server
Number Format, Writing 1E-5 Instead of 0.00001
Deleting Specific Rows from a Data Frame
Passing Arguments to Ggplot in a Wrapper
Two Y-Axes with Different Scales for Two Datasets in Ggplot2