How to Calculate Mean of All Columns, by Group

How to calculate mean of all columns, by group?

Edit2: Recent version of dplyr suggests using regular summarise with across function, as in:

library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise(across(everything(), mean))

What you're looking for is either ?summarise_all or ?summarise_each from dplyr

Edit: full code:

library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise_all("mean")

# Source: local data frame [8 x 11]
# Groups: cyl [?]
#
# cyl gear mpg disp hp drat wt qsec vs am carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 4 3 21.500 120.1000 97.0000 3.700000 2.465000 20.0100 1.0 0.00 1.000000
# 2 4 4 26.925 102.6250 76.0000 4.110000 2.378125 19.6125 1.0 0.75 1.500000
# 3 4 5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000 0.5 1.00 2.000000
# 4 6 3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300 1.0 0.00 1.000000
# 5 6 4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700 0.5 0.50 4.000000
# 6 6 5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000 0.0 1.00 6.000000
# 7 8 3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425 0.0 0.00 3.083333
# 8 8 5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500 0.0 1.00 6.000000

How to average all columns in dataset by group

You can use summarise_all instead of multiple uses of summarise:

library(dplyr)

data %>%
group_by(ID) %>%
summarise_all(mean)

# A tibble: 3 x 4
ID Tr1 Tr2 Tr3
<int> <dbl> <dbl> <dbl>
1 1 4 4.33 8
2 4 3.5 3.5 6
3 6 3.67 5.33 6.33

Mean per group in a data.frame

This type of operation is exactly what aggregate was designed for:

d <- read.table(text=
'Name Month Rate1 Rate2
Aira 1 12 23
Aira 2 18 73
Aira 3 19 45
Ben 1 53 19
Ben 2 22 87
Ben 3 19 45
Cat 1 22 87
Cat 2 67 43
Cat 3 45 32', header=TRUE)

aggregate(d[, 3:4], list(d$Name), mean)

Group.1 Rate1 Rate2
1 Aira 16.33333 47.00000
2 Ben 31.33333 50.33333
3 Cat 44.66667 54.00000

Here we aggregate columns 3 and 4 of data.frame d, grouping by d$Name, and applying the mean function.


Or, using a formula interface:

aggregate(. ~ Name, d[-2], mean)

Group pandas dataframe and calculate mean for multiple columns

df.groupby("category", as_index=False).mean()

Group by columns under conditions to calculate average

Use DataFrame.pivot_table with helper column new by copy like ColB, then flatten MultiIndex and add ouput to new DataFrame created by aggregate sum:

df1 = (df.assign(new=df['ColB'])
.pivot_table(index=['ColA', 'ColB'],
columns='new',
values=['interval','duration'],
fill_value=0,
aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
.sum()
.to_frame(name='SumCounter')
.join(df1).reset_index())
print (df)
ColA ColB SumCounter durationSD durationUD intervalSD intervalUD
0 A SD 3 2.5 0.0 3.5 0
1 A UD 10 0.0 2.0 0.0 1
2 B SD 32 2.0 0.0 3.5 0
3 B UD 4 0.0 1.5 0.0 2

How to calculate mean values grouped on another column in Pandas

You could groupby on StationID and then take mean() on BiasTemp. To output Dataframe, use as_index=False

In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean()
Out[4]:
StationID BiasTemp
0 BB 5.0
1 KEOPS 2.5
2 SS0279 15.0

Without as_index=False, it returns a Series instead

In [5]: df.groupby('StationID')['BiasTemp'].mean()
Out[5]:
StationID
BB 5.0
KEOPS 2.5
SS0279 15.0
Name: BiasTemp, dtype: float64

Read more about groupby in this pydata tutorial.

Means multiple columns by multiple groups

We can use dplyr with summarise_at to get mean of the concerned columns after grouping by the column of interest

library(dplyr)
airquality %>%
group_by(City, year) %>%
summarise_at(vars("PM25", "Ozone", "CO2"), mean)

Or using the devel version of dplyr (version - ‘0.8.99.9000’)

airquality %>%
group_by(City, year) %>%
summarise(across(PM25:CO2, mean))


Related Topics



Leave a reply



Submit