Row-wise average for a subset of columns with missing values
You can simply:
df['avg'] = df.mean(axis=1)
Monday Tuesday Wednesday avg
Mike 42 NaN 12 27.000000
Jenna NaN NaN 15 15.000000
Jon 21 4 1 8.666667
because .mean()
ignores missing values by default: see docs.
To select a subset, you can:
df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1)
Monday Tuesday Wednesday avg
Mike 42 NaN 12 42.0
Jenna NaN NaN 15 NaN
Jon 21 4 1 12.5
Create a column which is the mean of multiple columns in a data frame in pandas
The default behavior of DataFrame.mean()
should do what you want.
Here's an example showing taking a mean over a subset of the columns and placing it in a newly created column:
In[19]: tmp
Out[19]:
a b c
0 1 2 5.0
1 2 3 6.0
2 3 4 NaN
In[24]: tmp['mean'] = tmp[['b', 'c']].mean(axis=1)
In[25]: tmp
Out[25]:
a b c mean
0 1 2 5.0 3.5
1 2 3 6.0 4.5
2 3 4 NaN 4.0
As for what's going wrong in your code:
s['Q222'] = s['Q222'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])
You don't have numerical values (i.e 2, 3, 4) in your data frame, you have strings ('2', '3', and '4'). The DataFrame.mean()
function is treating these strings as NaN, so you are getting NaN as the result for all your mean calculations.
Try filling your frame with numbers, like so:
s['Q222'] = s['Q222'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
[2, 3, 4, 5, 6, 7, 8, np.NaN])
Calculate new column as the mean of other columns in pandas
an easy way to solve this problem is shown below :
col = df.loc[: , "salary_1":"salary_3"]
where "salary_1" is the start column name and "salary_3" is the end column name
df['salary_mean'] = col.mean(axis=1)
df
This will give you a new dataframe with a new column that shows the mean of all the other columns
This approach is really helpful when you are having a large set of columns or also helpful when you need to perform on only some selected columns not on all.
Calculate mean from multiple columns
Just call the mean again to get the mean of those 12 values:
df.mean().mean()
Find mean of multiple columns in R
We can use colMeans
on the selected columns and get the mean
of it, then assign the output to create new column (no packages are needed)
df$values_acceptance<- mean(colMeans(df[c('values', 'acceptance')], na.rm = TRUE))
-output
> df
values acceptance diffusion attitudes values_acceptance
1 9 8 9 7 7.833333
2 8 8 8 7 7.833333
3 NA NA 7 6 7.833333
4 8 6 NA NA 7.833333
Or if we need dplyr
library(dplyr)
df %>%
mutate(values_acceptance = mean(unlist(across(c(values,
acceptance), mean, na.rm = TRUE))))
-output
values acceptance diffusion attitudes values_acceptance
1 9 8 9 7 7.833333
2 8 8 8 7 7.833333
3 NA NA 7 6 7.833333
4 8 6 NA NA 7.833333
Compute the mean of two columns in a dataframe
We can use rowMeans
a$mean <- rowMeans(a[,c('high', 'low')], na.rm=TRUE)
NOTE: If there are NA values, it is better to use rowMeans
For example
a <- data.frame(High= c(NA, 3, 2), low= c(3, NA, 0))
rowMeans(a, na.rm=TRUE)
#[1] 3 3 1
and using +
a1 <- replace(a, is.na(a), 0)
(a1[1] + a1[2])/2
# High
#1 1.5
#2 1.5
#3 1.0
NOTE: This is no way trying to tarnish the other answer. It works in most cases and is fast as well.
Related Topics
How to Compute Roc and Auc Under Roc After Training Using Caret in R
Marking Specific Tiles in Geom_Tile()/Geom_Raster()
Rescaling the Y Axis in Bar Plot Causes Bars to Disappear:R Ggplot2
Showing Different Axis Labels Using Ggplot2 with Facet_Wrap
Barplot with 2 Variables Side by Side
How to Transpose a Dataframe in Tidyverse
How to Change X-Axis Tick Label Names, Order and Boxplot Colour Using R Ggplot
Extracting Coefficient Variable Names from Glmnet into a Data.Frame
Using Rcpp Functions Inside of R's Par*Apply Functions from the Parallel Package
Ggplot Geom_Point() with Colors Based on Specific, Discrete Values
Change Color of Leaflet Marker
How to Add an Inset (Subplot) to "Topright" of an R Plot
Convert a Row of a Data Frame to Vector
Efficient Alternatives to Merge for Larger Data.Frames R
Installing R 3.5.0 with --Enable-R-Shlib