Calculate Mean for Multiple Columns in Data.Frame

Row-wise average for a subset of columns with missing values

You can simply:

df['avg'] = df.mean(axis=1)

Monday Tuesday Wednesday avg
Mike 42 NaN 12 27.000000
Jenna NaN NaN 15 15.000000
Jon 21 4 1 8.666667

because .mean() ignores missing values by default: see docs.

To select a subset, you can:

df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1)

Monday Tuesday Wednesday avg
Mike 42 NaN 12 42.0
Jenna NaN NaN 15 NaN
Jon 21 4 1 12.5

Create a column which is the mean of multiple columns in a data frame in pandas

The default behavior of DataFrame.mean() should do what you want.

Here's an example showing taking a mean over a subset of the columns and placing it in a newly created column:

In[19]: tmp
Out[19]:
a b c
0 1 2 5.0
1 2 3 6.0
2 3 4 NaN

In[24]: tmp['mean'] = tmp[['b', 'c']].mean(axis=1)

In[25]: tmp
Out[25]:
a b c mean
0 1 2 5.0 3.5
1 2 3 6.0 4.5
2 3 4 NaN 4.0

As for what's going wrong in your code:

s['Q222'] = s['Q222'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])

You don't have numerical values (i.e 2, 3, 4) in your data frame, you have strings ('2', '3', and '4'). The DataFrame.mean() function is treating these strings as NaN, so you are getting NaN as the result for all your mean calculations.

Try filling your frame with numbers, like so:

 s['Q222'] = s['Q222'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
[2, 3, 4, 5, 6, 7, 8, np.NaN])

Calculate new column as the mean of other columns in pandas

an easy way to solve this problem is shown below :

col = df.loc[: , "salary_1":"salary_3"]

where "salary_1" is the start column name and "salary_3" is the end column name

df['salary_mean'] = col.mean(axis=1)
df

This will give you a new dataframe with a new column that shows the mean of all the other columns
This approach is really helpful when you are having a large set of columns or also helpful when you need to perform on only some selected columns not on all.

Calculate mean from multiple columns

Just call the mean again to get the mean of those 12 values:

df.mean().mean()

Find mean of multiple columns in R

We can use colMeans on the selected columns and get the mean of it, then assign the output to create new column (no packages are needed)

df$values_acceptance<- mean(colMeans(df[c('values', 'acceptance')], na.rm = TRUE))

-output

> df
values acceptance diffusion attitudes values_acceptance
1 9 8 9 7 7.833333
2 8 8 8 7 7.833333
3 NA NA 7 6 7.833333
4 8 6 NA NA 7.833333

Or if we need dplyr

library(dplyr)
df %>%
mutate(values_acceptance = mean(unlist(across(c(values,
acceptance), mean, na.rm = TRUE))))

-output

values acceptance diffusion attitudes values_acceptance
1 9 8 9 7 7.833333
2 8 8 8 7 7.833333
3 NA NA 7 6 7.833333
4 8 6 NA NA 7.833333

Compute the mean of two columns in a dataframe

We can use rowMeans

 a$mean <- rowMeans(a[,c('high', 'low')], na.rm=TRUE)

NOTE: If there are NA values, it is better to use rowMeans

For example

 a <- data.frame(High= c(NA, 3, 2), low= c(3, NA, 0))
rowMeans(a, na.rm=TRUE)
#[1] 3 3 1

and using +

 a1 <- replace(a, is.na(a), 0)
(a1[1] + a1[2])/2
# High
#1 1.5
#2 1.5
#3 1.0

NOTE: This is no way trying to tarnish the other answer. It works in most cases and is fast as well.



Related Topics



Leave a reply



Submit